LSE Data Analytics Online Career Accelerator¶

National Health Service (NHS): Resource Utilisation and Capacity Assessment¶

Introduction

This analysis examines non-standardised data collected from GP surgeries to address two critical NHS operational questions:

  • Has there been adequate staff and capacity in the networks?
  • What was the actual utilisation of resources?

The focus centres on analysing appointment data (missed and attended) to identify patterns, trends, and drivers that can produce actionable recommendations for improving services and resource allocation.

Research / Exploratory Questions

  • What is the date range and scope of the provided datasets?
  • What is the distribution of locations, service settings, context types, national categories, and appointment statuses?
  • Are there monthly/seasonal trends in appointments and cancellations?
  • Which periods show the highest appointment volumes and cancellation rates?
  • What is the typical appointment duration across different services?
  • How do external factors correlate with appointment patterns?

Analytical Questions

  • What are the primary drivers of missed appointments and can this be predicted?
  • Are there capacity shortages and resource utilisation issues?
  • What evidence-based recommendations can improve attendance rates?
  • Can the NHS adjust staffing and scheduling to optimise both efficiency and outcomes?
In [16]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime


import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

Categorical Ordering¶

In [17]:
# Define duration order
min_order = ['1-5 ', '6-10 ', '11-15 ', '16-20 ', '21-30 ', '31-60 ']
In [18]:
# Define Appointment Order
app_order = ['6','0','2','4','1','3','2']
In [19]:
# Define categorical order
day_order2 = ['Same Day', '1 Day', '2 to 7 Days', '8  to 14 Days', 
         '15  to 21 Days', '22  to 28 Days', 'More than 28 Days']
In [20]:
# Define day order
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
In [21]:
# Define your month order
month_order = ["August", "September", "October", "November", "December",
               "January", "February", "March", "April", "May", "June", "July"]

Defined Functions¶

In [22]:
def format_plotly_chart(fig):
    """Simple Plotly formatting function - aesthetics only"""
    return fig.update_layout(
        xaxis_tickangle=45,
        width=800,
        height=400,
        font_size=12,
        showlegend=True,
        plot_bgcolor='white',
        paper_bgcolor='white',
        xaxis=dict(
            categoryorder='array',
            categoryarray=month_order
        )
    )
In [23]:
# Define lead time order function that accepts a DataFrame
def lead_time_order(df):
    lead_times = [
        'Same Day',
        '1 Day',
        '2 to 7 Days',
        '8  to 14 Days',
        '15  to 21 Days',
        '22  to 28 Days',
        'More than 28 Days'
    ]
    
    # Convert booking_lead_time to categorical with specified order
    df['booking_lead_time'] = pd.Categorical(
        df['booking_lead_time'], 
        categories=lead_times, 
        ordered=True
    )
    
    # Sort by the categorical column
    return df.sort_values('booking_lead_time')
In [24]:
def pivot_function(
    data,
    group_by_column,
    outcome_column,
    count_column,
    time_column,
    target_outcome=None,
    group_values=None,
    flatten_columns=False
):
    # Basic validation
    for col in [group_by_column, outcome_column, count_column, time_column]:
        if col not in data.columns:
            raise ValueError(f"Column '{col}' not found in data.")

    # Filter out null or 'Unknown' values
    filtered_data = data.dropna(subset=[outcome_column, group_by_column])
    filtered_data = filtered_data[
        (filtered_data[outcome_column] != 'Unknown') &
        (filtered_data[group_by_column] != 'Unknown')
    ]

    # Optional filtering by group values
    if group_values:
        filtered_data = filtered_data[filtered_data[group_by_column].isin(group_values)]

    # Create pivot table
    pivot = pd.pivot_table(
        filtered_data,
        values=count_column,
        index=time_column,
        columns=[group_by_column, outcome_column],
        aggfunc='sum'
    )

    if pivot.empty:
        raise ValueError("The pivot table is empty after filtering. Check your filters or input data.")

    # Auto-detect target outcome if not specified
    if target_outcome is None:
        target_outcome = sorted(pivot.columns.get_level_values(1).unique())[0]

    # Add percentage columns
    for group_value in pivot.columns.get_level_values(0).unique():
        group_cols = pivot[group_value].columns
        if target_outcome in group_cols:
            total = pivot[group_value].sum(axis=1)
            percentage = (pivot[group_value][target_outcome] / total * 100).round(2)
            pivot[(group_value, f'{target_outcome} (%)')] = percentage

    # Optional: flatten MultiIndex columns for easier export or plotting
    if flatten_columns:
        pivot.columns = ['_'.join(map(str, col)).strip() for col in pivot.columns.values]

    return pivot

Standardise Plot Settings¶

In [25]:
def plot_format():  
# Use Seaborn's color-blind-friendly palette
    sns.set_palette("colorblind")
    # Apply Seaborn's "darkgrid" style
    sns.set_style("darkgrid")
# Set general figure aesthetics
plt.rcParams.update({
    "figure.figsize": (7,4.67),       # Figure size
    "axes.titlesize": 14,          # Title font size
    "axes.labelsize": 12,          # Label font size
    "xtick.labelsize": 10,         # X-axis tick font size
    "ytick.labelsize": 10,         # Y-axis tick font size
    "legend.fontsize": 10,         # Legend font size
    "grid.color": "gray",          # Grid color
    "grid.linestyle": "--",        # Dashed grid lines
    "grid.alpha": 0.5,             # Grid transparency
    "axes.grid": True,             # Enable grid
    "axes.edgecolor": "black",     # Axis border color
    "axes.spines.top": False,      # Hide top spine
    "axes.spines.right": False,    # Hide right spine
    "xtick.direction": "out",      # Outward ticks
    "ytick.direction": "out",
    "font.family": "DejaVu Sans",  # Font choice (good for readability)
    "font.size": 12,
    })
In [26]:
# defined function for identifying 'unknowns'
def unknown_summary(df):
    import pandas as pd
    result = pd.DataFrame({
        'Count': df.eq("Unknown").sum(),
        '%': (df.eq("Unknown").sum() / len(df) * 100).round(1)
    })
    print(result.to_string())

Importation of three datasets¶

In [30]:
# Import and sense-check the actual_duration.xlsx data set as ad.
ad = pd.read_excel('actual_duration_clean.xlsx')               
In [31]:
# Import the national_categories.xlsx data set as nc.
nc = pd.read_excel('national_categories_clean.xlsx')
In [32]:
# Import the appointments_regional file
ar = pd.read_excel('appointments_regional_clean.xlsx')

Data Validation¶

In [33]:
print(ad.shape[0])
print(nc.shape[0])
print(ar.shape[0])
137793
817394
596821

Merge will not be viable as datasets are different lengths

In [34]:
# Assess metadata 
print(ad.dtypes)
print(nc.dtypes)
print(ar.dtypes)
sub_icb_code                 object
sub_icb_code2                object
icb_location_name            object
icb_ons_code                 object
region_ons_code              object
appointment_date     datetime64[ns]
duration                     object
appointment_count             int64
dtype: object
appointment_date     datetime64[ns]
icb_ons_code                 object
icb_location_name            object
sub_icb_code                 object
service_setting              object
context_type                 object
national_category            object
appointment_count             int64
appointment_month            object
dtype: object
icb_ons_code          object
appointment_month     object
appointment_status    object
hcp_type              object
appointment_mode      object
booking_lead_time     object
appointment_count      int64
dtype: object

Note: appointment month is an 'object' in appointments dataset.

In [36]:
# Check the min and max dates.
print(ad['appointment_date'].min())
print(ad['appointment_date'].max())
print(nc['appointment_date'].min())
print(nc['appointment_date'].max())
print(ar['appointment_month'].min())
print(ar['appointment_month'].max())
2021-12-01 00:00:00
2022-06-30 00:00:00
2021-08-01 00:00:00
2022-06-30 00:00:00
2020-01
2022-06

Note: start dates differ in the three datasets. Appointments were scheduled between August 21' and June 22'

Data Cleaning (all datasets)¶

In [37]:
# Check for missing values
print(ad.isnull().sum())
print(nc.isnull().sum())
print(ar.isnull().sum())
sub_icb_code         0
sub_icb_code2        0
icb_location_name    0
icb_ons_code         0
region_ons_code      0
appointment_date     0
duration             0
appointment_count    0
dtype: int64
appointment_date     0
icb_ons_code         0
icb_location_name    0
sub_icb_code         0
service_setting      0
context_type         0
national_category    0
appointment_count    0
appointment_month    0
dtype: int64
icb_ons_code          0
appointment_month     0
appointment_status    0
hcp_type              0
appointment_mode      0
booking_lead_time     0
appointment_count     0
dtype: int64
In [38]:
# Delete all duplicated rows
ar.drop_duplicates(keep='first',inplace=True)

Note: regional appointment dataset contained 21,604 duplicated rows.

In [39]:
# Determine duplicated rows
print(ad.duplicated().sum())
print(nc.duplicated().sum())
print(ar.duplicated().sum())
0
0
0
In [40]:
unknown_summary(nc)
                   Count    %
appointment_date       0  0.0
icb_ons_code           0  0.0
icb_location_name      0  0.0
sub_icb_code           0  0.0
service_setting        0  0.0
context_type           0  0.0
national_category      0  0.0
appointment_count      0  0.0
appointment_month      0  0.0
In [41]:
unknown_summary(ad)
                   Count     %
sub_icb_code           0   0.0
sub_icb_code2          0   0.0
icb_location_name      0   0.0
icb_ons_code           0   0.0
region_ons_code        0   0.0
appointment_date       0   0.0
duration           20161  14.6
appointment_count      0   0.0
In [42]:
unknown_summary(ar)
                     Count     %
icb_ons_code             0   0.0
appointment_month        0   0.0
appointment_status  193374  33.6
hcp_type            122958  21.4
appointment_mode     77918  13.5
booking_lead_time    26696   4.6
appointment_count        0   0.0

Data Transformation:¶

Actual duration dataset¶

In [43]:
# Create new 'month' and 'day' columns
ad['month'] = ad['appointment_date'].dt.month_name()
ad['day']=ad['appointment_date'].dt.day_name()
ad.head()
Out[43]:
sub_icb_code sub_icb_code2 icb_location_name icb_ons_code region_ons_code appointment_date duration appointment_count month day
0 00L E38000130 NHS North East and North Cumbria E54000050 E40000012 2021-12-01 31-60 Minutes 364 December Wednesday
1 00L E38000130 NHS North East and North Cumbria E54000050 E40000012 2021-12-01 21-30 Minutes 619 December Wednesday
2 00L E38000130 NHS North East and North Cumbria E54000050 E40000012 2021-12-01 6-10 Minutes 1698 December Wednesday
3 00L E38000130 NHS North East and North Cumbria E54000050 E40000012 2021-12-01 Unknown 1277 December Wednesday
4 00L E38000130 NHS North East and North Cumbria E54000050 E40000012 2021-12-01 16-20 Minutes 730 December Wednesday
In [44]:
# Determine / confirm number of regional codes
print(ad['region_ons_code'].unique())
print(ad['region_ons_code'].nunique())
['E40000012' 'E40000010' 'E40000011' 'E40000007' 'E40000005' 'E40000006'
 'E40000003']
7
In [45]:
# Replace codes with names and rename column header

region_map = {
    'E40000012': 'NE & Yorkshire',
    'E40000007': 'East of England',
    'E40000003': 'London',
    'E40000011': 'Midlands',
    'E40000010': 'North West',
    'E40000005': 'South East',
    'E40000006': 'South West'
}

ad['region_ons_code'] = ad['region_ons_code'].replace(region_map)
ad.rename({'region_ons_code' : 'region'},axis=1, inplace=True)
In [46]:
ad.head()
Out[46]:
sub_icb_code sub_icb_code2 icb_location_name icb_ons_code region appointment_date duration appointment_count month day
0 00L E38000130 NHS North East and North Cumbria E54000050 NE & Yorkshire 2021-12-01 31-60 Minutes 364 December Wednesday
1 00L E38000130 NHS North East and North Cumbria E54000050 NE & Yorkshire 2021-12-01 21-30 Minutes 619 December Wednesday
2 00L E38000130 NHS North East and North Cumbria E54000050 NE & Yorkshire 2021-12-01 6-10 Minutes 1698 December Wednesday
3 00L E38000130 NHS North East and North Cumbria E54000050 NE & Yorkshire 2021-12-01 Unknown 1277 December Wednesday
4 00L E38000130 NHS North East and North Cumbria E54000050 NE & Yorkshire 2021-12-01 16-20 Minutes 730 December Wednesday

Data transformation:¶

Regional appointments dataset¶

In [47]:
# Create new month and day columns for plotting later
ar['month'] = ar['appointment_month'].apply(lambda x: datetime.strptime(x, '%Y-%m').strftime('%B'))
In [48]:
# Count number of 42 icb codes
ar['icb_ons_code'].nunique()
Out[48]:
42
In [49]:
# Convert 42 icb codes to named locations
# East of Eng(07), London(03), Midlands(11), NE and Yorkshire (12), North West(10), South East(05), South West(06)
ar.loc[ar['icb_ons_code'].isin(['E54000050','E54000051','E54000054','E54000061']),'region'] = 'North East & Yorkshire'
ar.loc[ar['icb_ons_code'].isin(['E54000008','E54000048','E54000057']),'region'] = 'North West'
ar.loc[ar['icb_ons_code'].isin(['E54000030','E54000031','E54000029','E54000028','E54000027']),'region'] = 'London'
ar.loc[ar['icb_ons_code'].isin(['E54000053','E54000052','E54000044','E54000042',
                                'E54000034','E54000032']),'region'] = 'South East'
ar.loc[ar['icb_ons_code'].isin(['E54000036','E54000037','E54000038','E54000039',
                                'E54000040','E54000041','E54000043']),'region'] = 'South West'
ar.loc[ar['icb_ons_code'].isin(['E54000010','E54000011','E54000013','E54000015',
                                'E54000018','E54000019','E54000055','E54000058',
                                'E54000059','E54000060','E54000062']),'region'] = 'Midlands'
ar.loc[ar['icb_ons_code'].isin(['E54000056','E54000022','E54000023','E54000024'
                                ,'E54000025','E54000026']),'region'] = 'East of England'
In [50]:
# Filter the data set to only look at data from 2021-08 onwards
ar = ar[ar['appointment_month'] >= '2021-08']
In [51]:
# Delete / Filter out rows with 'Unknown' values in 'Booking Lead Time' column only
ar = ar[ar['booking_lead_time']!='Unknown']
In [52]:
# Filter out'Unknowns' (for heatmaps)
ad = ad[ad['duration']!='Unknown']
In [53]:
# Check 'Unknown' values have been removed
unknown_counts = ad.eq("Unknown").sum()
unknown_counts
Out[53]:
sub_icb_code         0
sub_icb_code2        0
icb_location_name    0
icb_ons_code         0
region               0
appointment_date     0
duration             0
appointment_count    0
month                0
day                  0
dtype: int64
In [54]:
# Remove strings from duration column (for heatmaps)
ad['duration'] = ad['duration'].str.replace('Minutes','')
# Determine average for the two numbers
ad['duration_float'] = ad['duration'].apply(lambda x: sum(map(float, x.split('-'))))/2

Data Transformation:¶

National categories dataset¶

In [55]:
# Create 'month'column
nc['month'] = nc['appointment_date'].dt.month_name()
# Create 'day column
nc['day']=nc['appointment_date'].dt.day_name()
# View top of DataFrame.
nc.head()
Out[55]:
appointment_date icb_ons_code icb_location_name sub_icb_code service_setting context_type national_category appointment_count appointment_month month day
0 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L Primary Care Network Care Related Encounter Patient contact during Care Home Round 3 2021-08 August Monday
1 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L Other Care Related Encounter Planned Clinics 7 2021-08 August Monday
2 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L General Practice Care Related Encounter Home Visit 79 2021-08 August Monday
3 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L General Practice Care Related Encounter General Consultation Acute 725 2021-08 August Monday
4 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L General Practice Care Related Encounter Structured Medication Review 2 2021-08 August Monday
In [56]:
# Create a 'region' column
nc['region']="1"
In [57]:
# Identify number of unique codes
print(nc['icb_ons_code'].unique())
print(nc['icb_ons_code'].nunique())
['E54000050' 'E54000048' 'E54000057' 'E54000008' 'E54000061' 'E54000060'
 'E54000054' 'E54000051' 'E54000015' 'E54000010' 'E54000056' 'E54000025'
 'E54000023' 'E54000026' 'E54000053' 'E54000044' 'E54000042' 'E54000041'
 'E54000043' 'E54000036' 'E54000038' 'E54000039' 'E54000055' 'E54000058'
 'E54000037' 'E54000019' 'E54000022' 'E54000031' 'E54000013' 'E54000030'
 'E54000059' 'E54000032' 'E54000052' 'E54000040' 'E54000028' 'E54000029'
 'E54000018' 'E54000062' 'E54000034' 'E54000024' 'E54000011' 'E54000027']
42
In [58]:
# Populate 'region' column based on 42 icb codes and 7 region codes
nc.loc[nc['icb_ons_code'].isin(['E54000050','E54000051','E54000054','E54000061']),'region'] = 'North East & Yorkshire'
nc.loc[nc['icb_ons_code'].isin(['E54000008','E54000048','E54000057']),'region'] = 'North West'
nc.loc[nc['icb_ons_code'].isin(['E54000030','E54000031','E54000029','E54000028',
                                'E54000027']),'region'] = 'London'
nc.loc[nc['icb_ons_code'].isin(['E54000053','E54000052','E54000044','E54000042',
                                'E54000034','E54000032']),'region'] = 'South East'
nc.loc[nc['icb_ons_code'].isin(['E54000036','E54000037','E54000038','E54000039',
                                'E54000040','E54000041','E54000043']),'region'] = 'South West'
nc.loc[nc['icb_ons_code'].isin(['E54000010','E54000011','E54000013','E54000015',
                                'E54000018','E54000019','E54000055','E54000058',
                                'E54000059','E54000060','E54000062']),'region'] = 'Midlands'
nc.loc[nc['icb_ons_code'].isin(['E54000056','E54000022','E54000023','E54000024'
                                ,'E54000025','E54000026']),'region'] = 'East of England'
nc.head()
Out[58]:
appointment_date icb_ons_code icb_location_name sub_icb_code service_setting context_type national_category appointment_count appointment_month month day region
0 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L Primary Care Network Care Related Encounter Patient contact during Care Home Round 3 2021-08 August Monday North East & Yorkshire
1 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L Other Care Related Encounter Planned Clinics 7 2021-08 August Monday North East & Yorkshire
2 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L General Practice Care Related Encounter Home Visit 79 2021-08 August Monday North East & Yorkshire
3 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L General Practice Care Related Encounter General Consultation Acute 725 2021-08 August Monday North East & Yorkshire
4 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L General Practice Care Related Encounter Structured Medication Review 2 2021-08 August Monday North East & Yorkshire

Initial Data Exploration¶

In [59]:
# Create a new subset
ad_regional = ad[['icb_location_name','region','appointment_count','month','day','duration_float']].copy()
In [60]:
# Create plot to see regions by month and day of week
sns.relplot(data=ad_regional, x='month',y='appointment_count',hue='region',kind='line',col='day')
plt.show()
No description has been provided for this image
In [61]:
# Create plot to see regions by day
sns.relplot(data=ad_regional, x='month',y='appointment_count',hue='icb_location_name',kind='line',col='region',
            row='day')
plt.show()
No description has been provided for this image
In [62]:
# Facet by year and service setting
nc['year'] = pd.to_datetime(nc['appointment_date']).dt.year

service_order = nc['service_setting'].unique()

g = sns.FacetGrid(nc, col='year', col_wrap=3)
g.map(sns.boxplot, 'service_setting',
      'appointment_count', 
      order=service_order)
g.set_xticklabels(rotation=90)
plt.show()
No description has been provided for this image
In [63]:
# Facet by region and service setting
service_order = nc['service_setting'].unique()

g = sns.FacetGrid(nc, col='region', col_wrap=3)
g.map(sns.boxplot, 'service_setting',
      'appointment_count', 
      order=service_order)
g.set_xticklabels(rotation=90)
plt.show()
No description has been provided for this image

Daily appointment count¶

In [64]:
# Appointments by day (nc dataset)
day_stats = nc.groupby('day')['appointment_count'].sum().sort_values(ascending=False)
result = pd.DataFrame({
    'count': day_stats,
    'percentage': (day_stats / day_stats.sum()).round(3) * 100
})
print(result)
              count  percentage
day                            
Tuesday    61806933        20.9
Monday     59695267        20.2
Wednesday  58984265        19.9
Thursday   56976354        19.2
Friday     52394868        17.7
Saturday    5574922         1.9
Sunday       614161         0.2
In [65]:
# Daily appointments (Ad dataset)
plot_format()

# Create empty plot
fig, ax = plt.subplots()
plt.title('Daily Appointment Count')

# Use weights parameter to sum appointment_count for each day
ax.hist(ad['day'], bins=7, weights=ad['appointment_count'])

# Add labels
ax.set_ylabel('Total Appointments')
ax.set_xlabel('Day of Week')

plt.xticks(rotation=45)
plt.tight_layout()
No description has been provided for this image

Monthly appointments and records¶

In [66]:
# Number of appointments per month (sum of count_of_appointments by month).
records = (nc.groupby('month')['appointment_count'].agg([
    ('Appointments' , 'sum'),
    ('Records' , 'count')
]).assign(
        pct = lambda x: (x['Appointments'] / x['Appointments'].sum())*100
).round(1)
.sort_values('Appointments',ascending=False))
print(records)
           Appointments  Records   pct
month                                 
November       30405070    77652  10.3
October        30303834    74078  10.2
March          29595038    82822  10.0
September      28522501    74922   9.6
May            27495508    77425   9.3
June           25828078    74168   8.7
January        25635474    71896   8.7
February       25355260    71769   8.6
December       25140776    72651   8.5
April          23913060    70012   8.1
August         23852171    69999   8.1

Most appointments are in March, October and November

Service Settings (appointments and records)¶

In [67]:
service_count = (nc.groupby('service_setting')['appointment_count'].agg([
    ('Records', 'count'), 
    ('Appointments', 'sum')])
    .assign(percentage=lambda x: (x['Appointments'] / x['Appointments'].sum()) * 100)).round(1)
print(service_count)
                           Records  Appointments  percentage
service_setting                                             
Extended Access Provision   108122       2176807         0.7
General Practice            359274     270811691        91.5
Other                       138789       5420076         1.8
Primary Care Network        183790       6557386         2.2
Unmapped                     27419      11080810         3.7

Majority of appointments are within a General Practice

National Categories (appointments)¶

In [68]:
# Count records and appointments
category_count = (nc.groupby('national_category')['appointment_count'].agg([
    ('Appointments' , 'sum')
]).assign(pct=lambda x: x['Appointments'] / x['Appointments'].sum()) * 100
                 ).round(1)
print(category_count)
                                                    Appointments   pct
national_category                                                     
Care Home Needs Assessment & Personalised Care ...      40590400   0.1
Care Home Visit                                         62827900   0.2
Clinical Triage                                       4154696400  14.0
General Consultation Acute                            5369115000  18.1
General Consultation Routine                          9727152200  32.9
Group Consultation and Group Education                   6063200   0.0
Home Visit                                             214445200   0.7
Inconsistent Mapping                                  2789080200   9.4
Non-contractual chargeable work                         13891100   0.0
Patient contact during Care Home Round                  81033000   0.3
Planned Clinical Procedure                            2570269400   8.7
Planned Clinics                                       2801974800   9.5
Service provided by organisation external to th...      85213300   0.3
Social Prescribing Service                              47582800   0.2
Structured Medication Review                           185837900   0.6
Unmapped                                              1108081000   3.7
Unplanned Clinical Activity                            305579400   1.0
Walk-in                                                 41243800   0.1

Most appointments are general consultations but 14% are clinical triages which tend to occur when services are under pressure

Context Types (appointments and records)¶

In [69]:
# Count records and appointments
context_count = nc.groupby('context_type')['appointment_count'].agg([
    ('Appointments' , 'sum'),
    ('Records' , 'count')
])
print(context_count)
                        Appointments  Records
context_type                                 
Care Related Encounter     257075158   700481
Inconsistent Mapping        27890802    89494
Unmapped                    11080810    27419

Day & appointment length¶

In [70]:
# Create pivot table (for heatmap)
duration_pivot = (
    ad.pivot_table(
        values='appointment_count',
        index='duration',
        columns='day',
        aggfunc='sum'
    )
    .fillna(0)
    .reindex(columns=day_order)  # Reorder columns by day
    .reindex(min_order)  # Reorder rows by duration
)

duration_pivot
Out[70]:
day Monday Tuesday Wednesday Thursday Friday Saturday Sunday
duration
1-5 5748884 5909039 5977117 5621236 5112844 191952 39793
6-10 6813610 7015810 7062545 6687714 6027435 159258 34443
11-15 5062230 5259482 5225380 5006471 4466494 118320 22505
16-20 3200248 3359463 3319817 3202112 2836363 73074 13170
21-30 2968529 3168346 3112977 3029930 2670309 63853 12421
31-60 1760412 1917537 1886506 1851003 1642298 37312 8364
In [71]:
# Simple Plotly heatmap
fig = go.Figure(data=go.Heatmap(
    z=duration_pivot / 1000000,
    x=duration_pivot.columns,  # Extract column labels (days)
    y=duration_pivot.index,    # Extract row labels (duration ranges)
    colorscale='RdBu_r',
    showscale=True,
    text=(duration_pivot / 1000000).round(1),
    texttemplate="%{z:.1f}",
    textfont={"size": 12}
))

fig.update_layout(
    title='Appointment Count in Millions',
    xaxis_title="Day of Week",
    yaxis_title="Duration (minutes)"  
)

fig.show()
In [72]:
# Create 2nd pivot table (for heatmaps)
region_pivot = pd.pivot_table(
    data=ad,
    values='appointment_count',
    index='duration',
    columns='region',
    aggfunc='sum'
)
region_pivot = region_pivot.reindex(min_order) # re-order y axis
region_pivot
Out[72]:
region East of England London Midlands NE & Yorkshire North West South East South West
duration
1-5 3651451 3782616 5726686 4855371 2958938 4411714 3214089
6-10 4214200 4387847 6856802 5772736 3568966 5130150 3870114
11-15 3084578 3408083 5023018 4278092 2779066 3737221 2850824
16-20 1917654 2173747 3137413 2705941 1806421 2403073 1859998
21-30 1757527 2045572 2893048 2501396 1727293 2297229 1804300
31-60 1040916 1358224 1722190 1441085 1039191 1408854 1092972

Monthly analysis not possible due to missing data from August to November

Exploration: Region and appointment length)¶

In [73]:
# Simple Plotly heatmap
fig = go.Figure(data=go.Heatmap(
    z=region_pivot / 1000000,
    x=region_pivot.columns,  # Extract column labels (days)
    y=region_pivot.index,    # Extract row labels (duration ranges)
    colorscale='RdBu_r',
    showscale=True,
    text=(duration_pivot / 1000000).round(1),
    texttemplate="%{z:.1f}",
    textfont={"size": 12}
))

fig.update_layout(
    title='Appointment Count in Millions',
    xaxis_title="Day of Week",
    yaxis_title="Duration (minutes)"  
)

fig.show()

Most appointments are 6-10 minutes long, occur on Tuesday and Wednesday in the Midlands

Appointment lengths¶

In [74]:
# Estimate mean meeting duration
average_duration = (ad['duration_float'].mean().round(2))
print(f"The estimated average appointment is an estimated {average_duration} minutes.")
The estimated average appointment is an estimated 18.64 minutes.
In [75]:
# Create boxplot to show distribution of appointment lengths
plot_format()
sns.violinplot(data=ad['duration_float'])
plt.title('Distribution of Appointment Lengths')
plt.ylabel('Duration (minutes)')
plt.show()
 #so the longer appointments may relate to something specific.
No description has been provided for this image

There is a skew towards shorter appointments, but also no appointments between 30 and 40 minutes?

In [76]:
# Determine skewness of appointment count -1/+1 optimal, -2/+2 acceptable Curran et al. (1996)
skew = ad['duration_float'].skew().round(1)
print(f"The skewness is {skew} indicating a positive skew which reflects violin plot")
The skewness is 0.9 indicating a positive skew which reflects violin plot

Exploration: Categories and Values¶

In [77]:
# Determine the number of locations
print(f"There are {nc['region'].nunique()} NHS regions")
print(f"There are {nc['icb_ons_code'].nunique()} NHS sub-locations")
There are 7 NHS regions
There are 42 NHS sub-locations
In [78]:
most_apps = ad.groupby('icb_location_name').agg({
    'appointment_count': 'sum'
}).sort_values('appointment_count', ascending=False)
print(f"The sub-locations with most appointments are:\n{most_apps.head(5)}")
The sub-locations with most appointments are:
                                   appointment_count
icb_location_name                                   
NHS North East and North Cumbria             7450463
NHS West Yorkshire                           6418497
NHS Greater Manchester                       5494459
NHS North West London                        5230192
NHS Cheshire and Merseyside                  4978737
In [79]:
most_apps = nc.groupby('icb_location_name').agg({
    'appointment_count': 'sum'
}).sort_values('appointment_count', ascending=False)
print(f"The sub-locations with most appointments are:\n{most_apps.head(5)}")
The sub-locations with most appointments are:
                                      appointment_count
icb_location_name                                      
NHS North East and North Cumbria ICB           16882235
NHS West Yorkshire ICB                         14358371
NHS Greater Manchester ICB                     13857900
NHS Cheshire and Merseyside ICB                13250311
NHS North West London ICB                      12142390
In [80]:
print(f"There are {nc['service_setting'].nunique()} service settings. These are:")
print(nc['service_setting'].unique())
There are 5 service settings. These are:
['Primary Care Network' 'Other' 'General Practice' 'Unmapped'
 'Extended Access Provision']
In [81]:
print(f"There are {(nc['context_type'].nunique())} context types. These are:")
print(nc['context_type'].unique())
There are 3 context types. These are:
['Care Related Encounter' 'Unmapped' 'Inconsistent Mapping']
In [82]:
# Determine the number of national categories.
print(f"There are {(nc['national_category'].nunique())} national categories. These are:")
print(nc['national_category'].unique())
There are 18 national categories. These are:
['Patient contact during Care Home Round' 'Planned Clinics' 'Home Visit'
 'General Consultation Acute' 'Structured Medication Review'
 'Care Home Visit' 'Unmapped' 'Clinical Triage'
 'Planned Clinical Procedure' 'Inconsistent Mapping'
 'Care Home Needs Assessment & Personalised Care and Support Planning'
 'General Consultation Routine'
 'Service provided by organisation external to the practice'
 'Unplanned Clinical Activity' 'Social Prescribing Service'
 'Non-contractual chargeable work'
 'Group Consultation and Group Education' 'Walk-in']
In [83]:
# Determine unique values within each column
unique = nc.select_dtypes(include=['object', 'category']).nunique().to_dict()
unique
Out[83]:
{'icb_ons_code': 42,
 'icb_location_name': 42,
 'sub_icb_code': 106,
 'service_setting': 5,
 'context_type': 3,
 'national_category': 18,
 'appointment_month': 11,
 'month': 11,
 'day': 7,
 'region': 7}
In [84]:
print(ar['month'].unique())
['August' 'September' 'October' 'November' 'December' 'January' 'February'
 'March' 'April' 'May' 'June']

Note: July is missing from the data

In [85]:
unique = ar.select_dtypes(include=['object', 'category']).nunique().to_dict()
unique
Out[85]:
{'icb_ons_code': 42,
 'appointment_month': 11,
 'appointment_status': 3,
 'hcp_type': 3,
 'appointment_mode': 5,
 'booking_lead_time': 7,
 'month': 11,
 'region': 7}
In [86]:
# Determine the number of appointment statuses.
print(ar['appointment_status'].unique())
print(ar['hcp_type'].unique())
print(ar['appointment_mode'].unique())
['Attended' 'DNA' 'Unknown']
['GP' 'Other Practice staff' 'Unknown']
['Home Visit' 'Video/Online' 'Face-to-Face' 'Unknown' 'Telephone']

Most appointments occur between Sept - Nov and in March

Regional analysis¶

In [87]:
# create a new subset
nc_regional = nc[['icb_location_name','region','appointment_count','month','day']].copy()
In [88]:
# Create plot
plot_format()
sns.lineplot(data=nc_regional,
             x='month',
             y='appointment_count',
             hue='region',
             errorbar=None)
plt.title('Regional appointment count')
plt.xticks(rotation=45)
plt.legend(loc='upper right')
plt.legend(
    bbox_to_anchor=(1.05, 1),
    loc='upper left');
plt.show()
#plt.savefig('slide_6.png',dpi=150, bbox_inches='tight')
No description has been provided for this image
In [89]:
# create a new subset for regional analysis of ad data.
ar_regional = ar[['appointment_status', 'hcp_type', 'appointment_mode', 
                  'booking_lead_time', 'appointment_count', 'month']].copy()
# view the subset
ar_regional.head()
Out[89]:
appointment_status hcp_type appointment_mode booking_lead_time appointment_count month
350 Attended GP Home Visit More than 28 Days 1 August
351 Attended GP Video/Online Same Day 1 August
352 Attended Other Practice staff Video/Online More than 28 Days 1 August
353 Attended Unknown Face-to-Face 8 to 14 Days 1 August
355 DNA GP Home Visit 1 Day 1 August
In [90]:
sns.lineplot(data=ar, x='month',y='appointment_count',hue='region',errorbar=None)
plt.legend(loc='center left', bbox_to_anchor=(1, 0.8))
plt.title('Regional Appointments')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
No description has been provided for this image

Location and service setting¶

In [91]:
def analyze_location(location_name):
    return (nc
            .query(f"icb_location_name == '{location_name}'")
            .pipe(lambda df: df.groupby('service_setting')['appointment_count']
                  .agg([
                      ('Count', 'count'),
                      ('%', lambda x: (x.count() / len(df) * 100).round(2))
                  ])))
In [92]:
# This creates a DataFrame with MultiIndex
all_locations = pd.concat({
    loc: analyze_location(loc) 
    for loc in nc['icb_location_name'].unique()
})
In [93]:
heatmap_data = all_locations['%'].unstack().fillna(0)
In [94]:
fig = go.Figure(go.Heatmap(
    z=heatmap_data.values,
    x=heatmap_data.columns,
    y=heatmap_data.index,
    colorscale='YlGnBu',
    text=heatmap_data.round(1).astype(str),
    texttemplate="%{text}",  # Show labels on heatmap
    hovertemplate="Service: %{x}<br>Location: %{y}<br>%{z:.1f}%<extra></extra>",
    colorbar=dict(title="Percentage")
))

fig.update_layout(
    title="Service Setting by Location",
    height=1000,
    xaxis_tickangle=-45
)

fig.show()

Exploration: Location and context type¶

In [95]:
def analyze_location(location_name):
    return (nc
            .query(f"icb_location_name == '{location_name}'")
            .pipe(lambda df: df.groupby('context_type')['appointment_count']
                  .agg([
                      ('Count', 'count'),
                      ('%', lambda x: (x.count() / len(df) * 100).round(2))
                  ])))
In [96]:
# This creates a DataFrame with MultiIndex
all_locations = pd.concat({
    loc: analyze_location(loc) 
    for loc in nc['icb_location_name'].unique()
})
In [97]:
heatmap_data = all_locations['%'].unstack().fillna(0)
In [98]:
fig = go.Figure(go.Heatmap(
    z=heatmap_data.values,
    x=heatmap_data.columns,
    y=heatmap_data.index,
    colorscale='YlGnBu',
    text=heatmap_data.round(1).astype(str),
    texttemplate="%{text}",  # Show labels on heatmap
    hovertemplate="Service: %{x}<br>Location: %{y}<br>%{z:.1f}%<extra></extra>",
    colorbar=dict(title="Percentage")
))

fig.update_layout(
    title="Context types by Location",
    height=1000,
    xaxis_tickangle=-45
)

fig.show()

Location and National Categories¶

In [99]:
def analyze_location(location_name):
    return (nc
            .query(f"icb_location_name == '{location_name}'")
            .pipe(lambda df: df.groupby('national_category')['appointment_count']
                  .agg([
                      ('Count', 'count'),
                      ('%', lambda x: (x.count() / len(df) * 100).round(2))
                  ])))
In [100]:
# This creates a DataFrame with MultiIndex
all_locations = pd.concat({
    loc: analyze_location(loc) 
    for loc in nc['icb_location_name'].unique()
})
In [101]:
heatmap_data = all_locations['%'].unstack().fillna(0)
In [102]:
fig = go.Figure(go.Heatmap(
    z=heatmap_data.values,
    x=heatmap_data.columns,
    y=heatmap_data.index,
    colorscale='YlGnBu',
    text=heatmap_data.round(1).astype(str),
    texttemplate="%{text}",  # Show labels on heatmap
    hovertemplate="Service: %{x}<br>Location: %{y}<br>%{z:.1f}%<extra></extra>",
    colorbar=dict(title="Percentage")
))

fig.update_layout(
    title="National Categories by Location",
    height=1000,
    xaxis_tickangle=-45
)

fig.show()

There is some variation across regions but nothing that stands out as anomalous

Appointment visualisations¶

Service settings / appointment count

In [103]:
# filter necessary columns
service = nc[['month','service_setting','appointment_count']]
# Create lineplot
plot_format()
sns.lineplot(data=service, 
             x='month',
             y='appointment_count',
             hue='service_setting', 
             errorbar=None)

plt.xticks(rotation=45)
plt.title('Number of appointments for service settings')
plt.show()
No description has been provided for this image

Context types / appointment count

In [104]:
# Create context type dataframe 
context = nc[['context_type','month','appointment_count']].copy()
In [105]:
# Create a lineplot.
plot_format()
sns.lineplot(data=context, 
             x='month', 
             hue='context_type',
             y='appointment_count',
             errorbar=None)

plt.xticks(rotation=45)
plt.title('Number of appointments for context types');
# Note: I don't think data is aggregated in chart below
No description has been provided for this image

National categories / appointment count

In [106]:
# Create dataframe
national = nc[['month','national_category','service_setting','context_type','appointment_count']].copy()
In [107]:
# Create lineplot.
plt.figure(figsize=(10,8))
plot_format()
sns.lineplot(data=national, 
             x='month', 
             hue='national_category', 
             y='appointment_count',errorbar=None)

plt.xticks(rotation=45)
plt.legend(bbox_to_anchor=(0.5, -0.15), loc='upper center',ncol=2)
plt.title('Number of appointments by national category')
plt.tight_layout()
plt.show()
#plt.savefig("slide_5.png", dpi =150, bbox_inches='tight')
No description has been provided for this image

Note: Routine and Acute general consultations are the main driver.

National Categories continued: Filtered out least used categories (see conclusions)¶

In [108]:
# Keep only specific categories
categories_to_keep = ['Planned Clinics', 'General Consultation Routine', 'Clinical Triage', 
                     'Planned Clinical Procedure']  

national_filtered = national[national['national_category'].isin(categories_to_keep)].copy()
In [109]:
plot_format()
sns.lineplot(data=national_filtered,
            x='month',
            y='appointment_count',
            hue='national_category')
# Move legend below the chart
plt.legend(
    bbox_to_anchor=(0, -0.2),  # x, y coordinates 
    loc='upper left',ncol=3);
plt.title('Autumn and March Planned Procedures and Clinics')
plt.xticks(rotation=45)
plt.savefig('slide_5a.png', dpi=150, bbox_inches='tight')
plt.show()
No description has been provided for this image

Note: Planned procedures and planned clinics peak in Autumn, overall peaks in autumn and March.

Filter for General Practices only¶

In [110]:
# Apply two filters at once
gp_filtered = national[
    (national['service_setting'] == 'General Practice') & 
    (national['national_category'].isin(categories_to_keep))
].copy()
In [111]:
plot_format()
sns.lineplot(data=gp_filtered,
            x='month',
            y='appointment_count',
            hue='national_category'
            )
plt.legend(
    bbox_to_anchor=(0, -0.2),  # x, y coordinates 
    loc='upper left', ncol=3)
plt.title('Autumn and March Planned Procedures and Clinics (GPs)')
plt.xticks(rotation=45)
plt.savefig('slide_5b.png', dpi=150, bbox_inches='tight')
plt.show()
No description has been provided for this image

Note: Similar pattern in General Practice compared to overall pattern

Filter for PCNs only¶

In [112]:
# Apply two filters at once
pcn_filtered = national[
    (national['service_setting'] == 'Primary Care Network') & 
    (national['national_category'].isin(categories_to_keep))
].copy()
In [113]:
plot_format()
sns.lineplot(data=pcn_filtered,
            x='month',
            y='appointment_count',
            hue='national_category')
plt.legend(
    bbox_to_anchor=(0, -0.2),  # x, y coordinates 
    loc='upper left',ncol=3);
plt.title('Autumn and March Planned Procedures and Clinics (PCNs)')
plt.xticks(rotation=45)
plt.savefig('slide_5c.png', dpi=150, bbox_inches='tight')
plt.show()
No description has been provided for this image

Note: different pattern in PCNs. Routine consultations peak in March / May

In [114]:
# Pivot data for Extended Access Provision only
natcat_pivot4 = pd.pivot_table(
    data=nc[nc['service_setting']=='Extended Access Provision'],
    values='appointment_count',
    index='month',
    columns='national_category',
    aggfunc='sum'
)
# Reindex the pivot table with the specified month order
ordered_pivot = natcat_pivot4.reindex(month_order)
filtered_natcat4 = ordered_pivot.drop(['Group Consultation and Group Education',
                                       'Walk-in','Care Home Needs Assessment & Personalised Care and Support Planning',
                                       'Care Home Visit','Non-contractual chargeable work',
                                       'Patient contact during Care Home Round',
                                       'Service provided by organisation external to the practice',
                                       'Social Prescribing Service','Structured Medication Review','Home Visit',
                                       'Inconsistent Mapping','Unplanned Clinical Activity'], axis=1)
In [115]:
plot_format()
sns.lineplot(data=filtered_natcat4)
# Move legend below the chart
plt.legend(
    bbox_to_anchor=(0, -0.2),  # x, y coordinates 
    loc='upper left',ncol=3);
plt.title('Extended Access Provision Appointments')
plt.xticks(rotation=45)
plt.savefig('slide_5d.png', dpi=150, bbox_inches='tight')
plt.show()
No description has been provided for this image

Note: Significant peak for planned clinics in Autumn and a smaller peak for planned procedures. In February, the most significant peak is for Acute (sudden severe) general consultation. May warmer weather lead to more allergic reactions>

Focus on Clinical Triage (as triage is a reflection of high demand)¶

In [116]:
filtered = nc[nc['national_category'] == 'Clinical Triage']
triage = filtered.groupby('day')['appointment_count'].sum().reset_index()
In [117]:
fig = go.Figure()
fig.add_trace(go.Bar(
    x=triage['day'],
    y=triage['appointment_count'],
    marker_color='blue',  # Direct color assignment
    name='Clinical Triage'
))

fig.update_layout(
    title='Clinical Triage Appointments by Day',
    xaxis_title='Day',
    yaxis_title='Appointment Count'
)
fig.show()

Above pattern reflects overall appointments

In [118]:
triage = (nc.groupby('region')
          .agg(
              total_appointments=('appointment_count', 'sum'),
              triage_appointments=('appointment_count', lambda x: x[nc.loc[x.index, 
                                   'national_category'] == 'Clinical Triage'].sum())
          )
          .assign(pct=lambda x: (x['triage_appointments'] / x['total_appointments'] * 100).round(1))
          .reset_index())
triage
Out[118]:
region total_appointments triage_appointments pct
0 East of England 34055047 4983513 14.6
1 London 43484439 6827053 15.7
2 Midlands 57352329 7896633 13.8
3 North East & Yorkshire 47915966 5815851 12.1
4 North West 35865459 5261201 14.7
5 South East 45003534 6529038 14.5
6 South West 32369996 4233675 13.1
In [119]:
fig = go.Figure()
fig.add_trace(go.Bar(
    x=triage['region'],
    y=triage['total_appointments'],
    marker_color='blue',  # Direct color assignment
    name='Other'
))

fig.add_trace(go.Bar(
    x=triage['region'],
    y=triage['triage_appointments'],
    marker_color='red',  # Direct color assignment
    name='Triage '
))

fig.update_layout(
    title='Clinical Triage Appointments by Region',
    xaxis_title='Region',
    yaxis_title='Appointment Count'
)
fig.show()
In [120]:
# Create pivot
triage = (nc
    .pivot_table(
        values='appointment_count',
        index=['month','service_setting'],
        columns='national_category',
        aggfunc='sum',
        fill_value=0
    )
    .assign(
        total_appointments=lambda x: x.sum(axis=1),
        pct=lambda x: (x.get('Clinical Triage', 0) / x.sum(axis=1) * 100).round(1)
    )
    .rename(columns={'Clinical Triage': 'triage_appointments'})
    [['total_appointments', 'triage_appointments', 'pct']]
    .reindex(month_order, level='month')
    .reset_index()
         )
In [121]:
fig = px.line(triage, 
              x='month', 
              y='pct', 
              color='service_setting',
              title='Clinical Triage Appointments by Service Setting',
              markers=True)

fig.update_layout(
    xaxis_title='Month',
    yaxis_title='Percentage (%)',
    height=500
)
In [122]:
nc.head(1)
Out[122]:
appointment_date icb_ons_code icb_location_name sub_icb_code service_setting context_type national_category appointment_count appointment_month month day region year
0 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L Primary Care Network Care Related Encounter Patient contact during Care Home Round 3 2021-08 August Monday North East & Yorkshire 2021
In [123]:
triage2 = (nc.groupby('service_setting')
          .agg(
              total_appointments=('appointment_count', 'sum'),
              triage_appointments=('appointment_count', lambda x: x[nc.loc[x.index, 
                                   'national_category'] == 'Clinical Triage'].sum())
          )
          .assign(pct=lambda x: (x['triage_appointments'] / x['total_appointments'] * 100).round(1))
           .reset_index()
          )
triage2
Out[123]:
service_setting total_appointments triage_appointments pct
0 Extended Access Provision 2176807 382876 17.6
1 General Practice 270811691 40246576 14.9
2 Other 5420076 575083 10.6
3 Primary Care Network 6557386 342429 5.2
4 Unmapped 11080810 0 0.0
In [124]:
#chart moved to conclusions

Primary Care Networks are under less pressure, suggesting they work better than individual practices. Or it's lower because PCNs concentrate on preventative care

GPs = reactive care focus / doctor centric PCN - preventative care focus / range of roles / clinical pharmacists / mental health / chronic disease proactively managed

Create above over months¶

Compare authorities with highest and lowest PCNs: Cambridge (056) & South Essex (026)¶

In [125]:
filtered = ar[
    ar['icb_ons_code'].str.contains('000056', case=False, na=False)|
    ar['icb_ons_code'].str.contains('000026', case=False, na=False)
]
filtered.tail(3)
Out[125]:
icb_ons_code appointment_month appointment_status hcp_type appointment_mode booking_lead_time appointment_count month region
596199 E54000056 2022-03 Attended GP Telephone Same Day 78283 March East of England
596208 E54000056 2021-09 Attended GP Telephone Same Day 78801 September East of England
596241 E54000056 2021-11 Attended GP Telephone Same Day 80719 November East of England
In [126]:
# Cambridge vs South Essex - breakdown by status
cam_ess = (pd.crosstab(filtered['icb_ons_code'], 
                       filtered['appointment_status'], 
                       values=filtered['appointment_count'], 
                       aggfunc='sum')
           .reset_index()
           .assign(
               Total=lambda x: x['Attended'] + x['DNA'],
               DNA_rate=lambda x: x['DNA'] / (x['Attended'] + x['DNA']),
               Attended_rate=lambda x: x['Attended'] / (x['Attended'] + x['DNA'])
           ))
cam_ess
Out[126]:
appointment_status icb_ons_code Attended DNA Unknown Total DNA_rate Attended_rate
0 E54000026 5225460 166364 230151 5391824 0.030855 0.969145
1 E54000056 4756334 174186 183691 4930520 0.035328 0.964672
In [127]:
# Cambridge vs South Essex - breakdown by status
cam_ess = (pd.crosstab([filtered['icb_ons_code'], filtered['appointment_mode']], 
                       filtered['appointment_status'], 
                       values=filtered['appointment_count'], 
                       aggfunc='sum')
           .reset_index()
           .assign(
               Total=lambda x: x['Attended'] + x['DNA'],
               DNA_rate=lambda x: x['DNA'] / (x['Attended'] + x['DNA']),
               attended_rate=lambda x: x['Attended'] / (x['Attended'] + x['DNA'])
           ))
cam_ess
Out[127]:
appointment_status icb_ons_code appointment_mode Attended DNA Unknown Total DNA_rate attended_rate
0 E54000026 Face-to-Face 3518398 144996 173329 3663394 0.039580 0.960420
1 E54000026 Home Visit 9975 342 502 10317 0.033149 0.966851
2 E54000026 Telephone 1440289 11070 38266 1451359 0.007627 0.992373
3 E54000026 Unknown 191566 6232 15220 197798 0.031507 0.968493
4 E54000026 Video/Online 65232 3724 2834 68956 0.054005 0.945995
5 E54000056 Face-to-Face 2795587 154868 124568 2950455 0.052490 0.947510
6 E54000056 Home Visit 12370 155 956 12525 0.012375 0.987625
7 E54000056 Telephone 1642559 9908 44132 1652467 0.005996 0.994004
8 E54000056 Unknown 262449 7007 11215 269456 0.026004 0.973996
9 E54000056 Video/Online 43369 2248 2820 45617 0.049280 0.950720
In [128]:
cam_ess = (pd.crosstab(filtered['icb_ons_code'], 
                       filtered['hcp_type'], 
                       values=filtered['appointment_count'], 
                       aggfunc='sum')
           .reset_index()
           .assign(
               Total=lambda x: x['GP'] + x['Other Practice staff'],
               GP_pct=lambda x: x['GP'] / (x['GP'] + x['Other Practice staff'])
           ))
cam_ess
Out[128]:
hcp_type icb_ons_code GP Other Practice staff Unknown Total GP_pct
0 E54000026 2694766 2729563 197646 5424329 0.496793
1 E54000056 2232712 2612805 268694 4845517 0.460779

As expected higher proportion of GP appointments in South Essex (GP) compared to PCN. This perhaps explains slightly higher attendance above.

However, whilst Face to Face visits are lower in Cambridge, all other modes are higher

In [129]:
# Cambridge vs South Essex - breakdown by status
cam_ess = (pd.crosstab([filtered['icb_ons_code'], filtered['appointment_mode']], 
                       filtered['hcp_type'], 
                       values=filtered['appointment_count'], 
                       aggfunc='sum')
           .reset_index()
           .assign(
               Total=lambda x: x['Other Practice staff'] + x['GP'],
               GP_pct=lambda x: x['GP'] / (x['Other Practice staff'] + x['GP'])
               .round(2)
           ))
cam_ess
Out[129]:
hcp_type icb_ons_code appointment_mode GP Other Practice staff Unknown Total GP_pct
0 E54000026 Face-to-Face 1693744.0 2142979.0 NaN 3836723.0 0.441456
1 E54000026 Home Visit 3998.0 6821.0 NaN 10819.0 0.369535
2 E54000026 Telephone 962926.0 526699.0 NaN 1489625.0 0.646422
3 E54000026 Unknown 7718.0 7654.0 197646.0 15372.0 0.502082
4 E54000026 Video/Online 26380.0 45410.0 NaN 71790.0 0.367461
5 E54000056 Face-to-Face 1078383.0 1984489.0 12151.0 3062872.0 0.352082
6 E54000056 Home Visit 4812.0 8140.0 529.0 12952.0 0.371526
7 E54000056 Telephone 1110505.0 571659.0 14435.0 1682164.0 0.660165
8 E54000056 Unknown 31139.0 7953.0 241579.0 39092.0 0.796557
9 E54000056 Video/Online 7873.0 40564.0 NaN 48437.0 0.162541
In [130]:
fig = px.bar(cam_ess, 
             x='appointment_mode', 
             y='GP_pct',
             color='icb_ons_code',  # This works in px.bar
             title='GP Percentage by Appointment Mode and ICB Code',
             labels={'GP_pct': 'GP Percentage', 'appointment_mode': 'Appointment Mode'})

fig.update_layout(
    xaxis_tickangle=-45,
    barmode='group'
)

fig.show()

Exploration: Utilisation and Capacity¶

In [131]:
# Calculate overall utilisation (95.63%)
percent = pd.pivot_table(
    data=ar,
    values='appointment_count',
    columns='appointment_status',
    aggfunc='sum'
)
percent['total'] = percent['Attended'] + percent['DNA']
percent['utilisation(%)'] = (percent['Attended']/ percent['total'] * 100).round(2)
percent
Out[131]:
appointment_status Attended DNA Unknown total utilisation(%)
appointment_count 270479029 13280848 11992521 283759877 95.32

Only 95.3% appointments are being utilised. This tends to suggest that there is slack in the system. Staffing levels therefore may/may not be adequate. This is incertain if networks are not at 100% capacity.

Utilisation by month¶

In [132]:
# Pivot to get monthly utilisation
month_ = (
    ar.pivot_table(values='appointment_count', 
                   index='month', 
                   columns='appointment_status', 
                   aggfunc='sum')
      .assign(
          total=lambda df: df['Attended'] + df['DNA'],
          utilisation_pct=lambda df: (df['Attended'] / df['total'] * 100).round(2)
      )
      .reindex(month_order)
)
month_
Out[132]:
appointment_status Attended DNA Unknown total utilisation_pct
month
August 22072706.0 944674.0 812626.0 23017380.0 95.90
September 25745698.0 1318269.0 1435376.0 27063967.0 95.13
October 27158054.0 1562316.0 1560182.0 28720370.0 94.56
November 27649315.0 1425547.0 1301758.0 29074862.0 95.10
December 22840398.0 1195213.0 1082466.0 24035611.0 95.03
January 23580128.0 1072474.0 955851.0 24652602.0 95.65
February 23333376.0 1073255.0 919910.0 24406631.0 95.60
March 27149715.0 1287042.0 1128660.0 28436757.0 95.47
April 21935445.0 1042100.0 912479.0 22977545.0 95.46
May 25318735.0 1195368.0 945083.0 26514103.0 95.49
June 23695459.0 1164590.0 938130.0 24860049.0 95.32
July NaN NaN NaN NaN NaN
In [133]:
# Calculate the average utilisation of services by month Utilisation based on attended / DNA
monthly_apps = pd.pivot_table(
    data=ar,
    values='appointment_count',
    index='month',
    columns=['appointment_status'],
    aggfunc='sum'
)

monthly_apps['total'] = monthly_apps['Attended'] + monthly_apps['DNA']
# re-order months in pivot table
monthly_apps = monthly_apps.reindex(month_order)
# Created a column showing % of appointments used
monthly_apps['utilisation(%)'] = (monthly_apps['Attended']/ monthly_apps['total'] * 100).round(2)
# Created a column showing to a daily value (based on 30 days)
monthly_apps['daily_attendance'] = (monthly_apps['Attended'] / 30).round(2)
# View the DataFrame.
monthly_apps
Out[133]:
appointment_status Attended DNA Unknown total utilisation(%) daily_attendance
month
August 22072706.0 944674.0 812626.0 23017380.0 95.90 735756.87
September 25745698.0 1318269.0 1435376.0 27063967.0 95.13 858189.93
October 27158054.0 1562316.0 1560182.0 28720370.0 94.56 905268.47
November 27649315.0 1425547.0 1301758.0 29074862.0 95.10 921643.83
December 22840398.0 1195213.0 1082466.0 24035611.0 95.03 761346.60
January 23580128.0 1072474.0 955851.0 24652602.0 95.65 786004.27
February 23333376.0 1073255.0 919910.0 24406631.0 95.60 777779.20
March 27149715.0 1287042.0 1128660.0 28436757.0 95.47 904990.50
April 21935445.0 1042100.0 912479.0 22977545.0 95.46 731181.50
May 25318735.0 1195368.0 945083.0 26514103.0 95.49 843957.83
June 23695459.0 1164590.0 938130.0 24860049.0 95.32 789848.63
July NaN NaN NaN NaN NaN NaN
In [134]:
plot_format()
# Plot monthly capacity utilisation.
sns.lineplot(data=monthly_apps,x='month',y='total',label='Total')
sns.lineplot(data=monthly_apps,x='month',y='Attended', label='Attended')
sns.lineplot(data=monthly_apps,x='month',y='DNA',label='Did Not Attend')
plt.legend()
plt.title('Chart showing 95% attendance')
# Re-order months
ax = plt.gca()
plt.xticks(rotation=45)
plt.title("Appointment Attendance");
# Create a lineplot.
No description has been provided for this image

Utilisation ranges between 94.56% and 95.9%

Utilisation and staff¶

In [135]:
# Determine % of appointments cancelled relating to hcp_type
hcp =(
    ar.pivot_table(
    values='appointment_count',
    index='hcp_type',
    columns='appointment_status',
    aggfunc='sum'
    )
    .assign(
        total = lambda df: df['Attended'] + df['DNA'], # create total
        pct = lambda df: (df['Attended']/ df['total'] * 100).round(2) # create %
        ))
hcp
Out[135]:
appointment_status Attended DNA Unknown total pct
hcp_type
GP 140102040 3829781 4305542 143931821 97.34
Other Practice staff 121734048 9178841 6933459 130912889 92.99
Unknown 8642941 272226 753520 8915167 96.95
In [136]:
# Medical appointments
hcp_pivot = pivot_function(
    ar, 'hcp_type', 'appointment_status', 'appointment_count', 'month', 
    target_outcome='Attended'
)
hcp_pivot = hcp_pivot.reindex(month_order)
hcp_pivot
Out[136]:
hcp_type GP Other Practice staff GP Other Practice staff
appointment_status Attended DNA Attended DNA Attended (%) Attended (%)
month
August 11708057.0 271814.0 9688402.0 652491.0 97.73 93.69
September 13592889.0 392037.0 11358456.0 901514.0 97.20 92.65
October 13339524.0 439647.0 12888298.0 1090065.0 96.81 92.20
November 14050249.0 388131.0 12644696.0 1002219.0 97.31 92.66
December 11957206.0 319638.0 10113211.0 846302.0 97.40 92.28
January 12470633.0 302384.0 10401221.0 747854.0 97.63 93.29
February 12198417.0 309926.0 10417610.0 742160.0 97.52 93.35
March 14305014.0 382332.0 12001478.0 880620.0 97.40 93.16
April 11258586.0 304701.0 9968784.0 718013.0 97.36 93.28
May 13049643.0 364671.0 11474524.0 808645.0 97.28 93.42
June 12171822.0 354500.0 10777368.0 788958.0 97.17 93.18
July NaN NaN NaN NaN NaN NaN

Both GPs and other staff are busy at similar types, although during autumn significantly more pressure is on 'other healthcare professionals'.

In [137]:
plot_format()
# Pandas method
ax = hcp_pivot[[('GP', 'Attended (%)'), ('Other Practice staff', 'Attended (%)')]].plot(
    kind='line', 
    marker='o',
    title='Appointments Attended by HCP Type (%)'
)
ax.set_ylabel('Attended (%)')
ax.legend(['GP', 'Other Practice staff'])
plt.tight_layout()
plt.show()
No description has been provided for this image

Utilisation and Appointment Mode¶

Overall, Face-to-Face appointments are cancelled most frequently and Telephone appointments the least.

In [138]:
# Determine % of appointments cancelled by mode
mode = (
    ar.pivot_table(
        values='appointment_count',
        index='appointment_mode',
        columns='appointment_status',
        aggfunc='sum'
    )
    .assign(
        total=lambda df: df['Attended'] + df['DNA'],
        pct=lambda df: (df['Attended'] / df['total'] * 100).round(2)
    )
)
mode
Out[138]:
appointment_status Attended DNA Unknown total pct
appointment_mode
Face-to-Face 164434900 10743715 8174625 175178615 93.87
Home Visit 1494800 76189 390143 1570989 95.15
Telephone 95306218 2080948 2792120 97387166 97.86
Unknown 7829901 303177 586592 8133078 96.27
Video/Online 1413210 76819 49041 1490029 94.84

Utilisation by Appointment Mode and Month¶

In [139]:
# Medical appointments
mode_pivot = pivot_function(
    ar, 'appointment_mode', 'appointment_status', 'appointment_count', 'month', 
    target_outcome='Attended'
)
mode_pivot = mode_pivot.reindex(month_order)
mode_pivot
Out[139]:
appointment_mode Face-to-Face Home Visit Telephone Video/Online Face-to-Face Home Visit Telephone Video/Online
appointment_status Attended DNA Attended DNA Attended DNA Attended DNA Attended (%) Attended (%) Attended (%) Attended (%)
month
August 12481186.0 723111.0 113533.0 6549.0 8724392.0 184850.0 101541.0 5136.0 94.52 94.55 97.93 95.19
September 15142705.0 1066003.0 126549.0 7517.0 9575370.0 205075.0 137877.0 7882.0 93.42 94.39 97.90 94.59
October 16982958.0 1312707.0 132144.0 7574.0 8978200.0 190570.0 156905.0 8900.0 92.83 94.58 97.92 94.63
November 16949964.0 1167488.0 149543.0 8729.0 9497883.0 201527.0 146366.0 8595.0 93.56 94.48 97.92 94.45
December 13597975.0 971685.0 132406.0 8320.0 8289616.0 175815.0 114154.0 7146.0 93.33 94.09 97.92 94.11
January 13921132.0 851572.0 127718.0 7413.0 8776010.0 182045.0 120476.0 6494.0 94.24 94.51 97.97 94.89
February 14059436.0 858543.0 129188.0 6069.0 8388916.0 179837.0 118086.0 6131.0 94.24 95.51 97.90 95.06
March 16536414.0 1035050.0 152788.0 6369.0 9579993.0 212548.0 140662.0 7199.0 94.11 96.00 97.83 95.13
April 13588356.0 841473.0 132294.0 5221.0 7492249.0 169628.0 117385.0 5983.0 94.17 96.20 97.79 95.15
May 16020354.0 968223.0 153939.0 6632.0 8350132.0 193483.0 133287.0 6804.0 94.30 95.87 97.74 95.14
June 15154420.0 947860.0 144698.0 5796.0 7653457.0 185570.0 126471.0 6549.0 94.11 96.15 97.63 95.08
July NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Chart moved to conclusions:

In [140]:
# Plotly Method
fig = go.Figure()

# Add each line directly from the pivot table
fig.add_trace(go.Scatter(
    x=mode_pivot.index, 
    y=mode_pivot[('Face-to-Face', 'Attended (%)')],
    mode='lines+markers',
    name='Face to Face'
))

fig.add_trace(go.Scatter(
    x=mode_pivot.index, 
    y=mode_pivot[('Home Visit', 'Attended (%)')],
    mode='lines+markers', 
    name='Home Visit'
))

fig.add_trace(go.Scatter(
    x=mode_pivot.index,
    y=mode_pivot[('Telephone','Attended (%)')],
    mode='lines+markers',
    name='Telephone'
))  
fig.add_trace(go.Scatter(
    x=mode_pivot.index,
    y=mode_pivot[('Video/Online','Attended (%)')],
    mode='lines+markers',
    name='Video'
))
fig.update_layout(
    title='Appointments Attended by Mode (%)',
    xaxis_title='Month',
    yaxis_title='Attended (%)',
    height=500
)

fig.show()

There are considerably more home visits from late winter into spring. What might be driving this?

Total appointments by mode¶

Home visit attendance ranges between 94.1 and 96.2%. Highest attendance during busiest period.

In [141]:
# Plotly Method
fig = go.Figure()

# Add each line directly from the pivot table
fig.add_trace(go.Scatter(
    x=mode_pivot.index, 
    y=mode_pivot[('Face-to-Face', 'Attended')],
    mode='lines+markers',
    name='Face to Face'
))

fig.add_trace(go.Scatter(
    x=mode_pivot.index, 
    y=mode_pivot[('Home Visit', 'Attended')],
    mode='lines+markers', 
    marker=dict(symbol='square',size=10,color='red'),
    name='Home Visit'
))

fig.add_trace(go.Scatter(
    x=mode_pivot.index,
    y=mode_pivot[('Telephone','Attended')],
    mode='lines+markers',
    name='Telephone'
))  
fig.add_trace(go.Scatter(
    x=mode_pivot.index, 
    y=mode_pivot[('Video/Online', 'Attended')],
    mode='lines+markers',
    line=dict(dash='dot', width=3,color='yellow'),
    name='Video'
))
fig.update_layout(
    title='Appointment Totals by Mode',
    xaxis_title='Month',
    yaxis_title='Attended',
    height=500
)

fig.show()

Q4: During the busiest months Face-to-Face visits are the most common appointment mode.

Utilisation by Region¶

Region has very little influence on it's own (see below).

In [142]:
# Determine % of appointments cancelled relating to region
region = pd.pivot_table(
    data=ar,
    values='appointment_count',
    index='region',
    columns='appointment_status',
    aggfunc='sum'
)
region['total'] = region['Attended'] + region['DNA'] # create total
region['utilisation(%)'] = (region['Attended']/ region['total'] * 100).round(2) # create %
region
Out[142]:
appointment_status Attended DNA Unknown total utilisation(%)
region
East of England 31569080 1202469 1261910 32771549 96.33
London 39207133 2406467 1809280 41613600 94.22
Midlands 52562333 2557642 2194472 55119975 95.36
North East & Yorkshire 44172983 1890871 1799711 46063854 95.90
North West 32034002 1991351 1755759 34025353 94.15
South East 41163384 1970145 1846803 43133529 95.43
South West 29770114 1261903 1324586 31032017 95.93

Month also has little impact on utilisation by itself (see below)

In [ ]:
 

Booking Lead Time Analysis¶

Q5: Most appointments are 'same day' bookings (see below)

In [143]:
plot_format()
# Create a line plot to answer the question.
sns.lineplot(data=ar, x='month',y='appointment_count',hue='booking_lead_time',errorbar=None)
plt.title('Most appointments booked on same day')
plt.xticks(rotation=45)
plt.legend(loc='center left')
plt.savefig('slide_9',dpi=150, bbox_inches='tight')
No description has been provided for this image

Clear that same day appointments are most attended and those over 28 days are least attended.

In [144]:
plot_format()
sns.histplot(
    data=ar, x='booking_lead_time', weights='appointment_count',
    hue='appointment_status', multiple='fill', shrink=0.8, stat='percent'
)
plt.ylabel('Percentage')
plt.xticks(rotation=45)
plt.show()
#plt.savefig('slide_8',dpi=150, bbox_inches='tight');
No description has been provided for this image
In [ ]:
 
In [145]:
# Pivot to get monthly utilisation
month_ = (
    ar.pivot_table(values='appointment_count', 
                   index='month', 
                   columns='appointment_status', 
                   aggfunc='sum')
      .assign(
          total=lambda df: df['Attended'] + df['DNA'],
          utilisation_pct=lambda df: (df['Attended'] / df['total'] * 100).round(2)
      )
      .reindex(month_order)
)
month_
Out[145]:
appointment_status Attended DNA Unknown total utilisation_pct
month
August 22072706.0 944674.0 812626.0 23017380.0 95.90
September 25745698.0 1318269.0 1435376.0 27063967.0 95.13
October 27158054.0 1562316.0 1560182.0 28720370.0 94.56
November 27649315.0 1425547.0 1301758.0 29074862.0 95.10
December 22840398.0 1195213.0 1082466.0 24035611.0 95.03
January 23580128.0 1072474.0 955851.0 24652602.0 95.65
February 23333376.0 1073255.0 919910.0 24406631.0 95.60
March 27149715.0 1287042.0 1128660.0 28436757.0 95.47
April 21935445.0 1042100.0 912479.0 22977545.0 95.46
May 25318735.0 1195368.0 945083.0 26514103.0 95.49
June 23695459.0 1164590.0 938130.0 24860049.0 95.32
July NaN NaN NaN NaN NaN
In [ ]:
 
In [ ]:
 

HCP and Mode Combined¶

Other Staff and Face-to-Face / online signicantly under utilised

In [146]:
hcp_plus_mode= (
    ar.pivot_table(
        values='appointment_count',
        index=['hcp_type','appointment_mode'],
        columns='appointment_status',
        aggfunc='sum')
    .assign(
        total = lambda df: df['Attended'] + df['DNA'], # create total
        pct = lambda df: (df['Attended']/ df['total'] * 100).round(2)
        ))
hcp_plus_mode
Out[146]:
appointment_status Attended DNA Unknown total pct
hcp_type appointment_mode
GP Face-to-Face 70089048 2599222 2420398 72688270 96.42
Home Visit 647267 21000 147014 668267 96.86
Telephone 67330728 1144185 1637126 68474913 98.33
Unknown 1351566 44347 79520 1395913 96.82
Video/Online 683431 21027 21484 704458 97.02
Other Practice staff Face-to-Face 92755796 8099216 5485883 100855012 91.97
Home Visit 686722 27058 170434 713780 96.21
Telephone 26348843 893656 1074142 27242499 96.72
Unknown 1216193 103202 175506 1319395 92.18
Video/Online 726494 55709 27494 782203 92.88
Unknown Face-to-Face 1590056 45277 268344 1635333 97.23
Home Visit 160811 28131 72695 188942 85.11
Telephone 1626647 43107 80852 1669754 97.42
Unknown 5262142 155628 331566 5417770 97.13
Video/Online 3285 83 63 3368 97.54

Lead time and month combined¶

In [147]:
lead_plus_month = (
    pd.pivot_table(
        data=ar,
        values='appointment_count',
        index=['booking_lead_time', 'month'],
        columns='appointment_status',
        aggfunc='sum'
    )
    .assign(
        total = lambda df: df['Attended'] + df['DNA'],
        utilisation_percent = lambda df: (df['Attended'] / df['total'] * 100).round(2)
    ))
lead_plus_month.head(30)
Out[147]:
appointment_status Attended DNA Unknown total utilisation_percent
booking_lead_time month
1 Day April 1943406 80115 63820 2023521 96.04
August 1883711 72733 60335 1956444 96.28
December 2049032 97988 71369 2147020 95.44
February 2115779 87138 68885 2202917 96.04
January 2174201 91231 65785 2265432 95.97
June 2033281 85991 67972 2119272 95.94
March 2518219 105818 84153 2624037 95.97
May 2109755 86115 69500 2195870 96.08
November 2317942 103812 83583 2421754 95.71
October 2387826 116880 92706 2504706 95.33
September 2470440 111702 82836 2582142 95.67
15 to 21 Days April 1474721 136060 108248 1610781 91.55
August 1271577 105198 75235 1376775 92.36
December 1428329 149028 130653 1577357 90.55
February 1305166 116372 101830 1421538 91.81
January 1110841 103922 96784 1214763 91.45
June 1654592 152401 107109 1806993 91.57
March 1630248 147845 127252 1778093 91.69
May 1548172 133148 99261 1681320 92.08
November 1904536 184407 161833 2088943 91.17
October 1652333 180132 172505 1832465 90.17
September 1525576 155199 187001 1680775 90.77
2 to 7 Days April 4090959 262892 172254 4353851 93.96
August 4494484 261761 167225 4756245 94.50
December 4198816 321505 219869 4520321 92.89
February 4955040 317775 192251 5272815 93.97
January 5089519 322082 208222 5411601 94.05
June 4265542 278651 161352 4544193 93.87
March 5492887 357962 227040 5850849 93.88
May 5054848 326914 182062 5381762 93.93
In [148]:
nc_apps = (
    pd.pivot_table(
        data=nc,
        values='appointment_count',
        index='month',
        columns='service_setting',
        aggfunc='sum'
    )
    .assign(
        total = lambda df: (
            df['Primary Care Network'] +
            df['Extended Access Provision'] +
            df['General Practice'] +
            df['Other'] +
            df['Unmapped']
        )
    )
    .reindex(month_order)
    .drop('July', axis=0)
)
nc_apps
Out[148]:
service_setting Extended Access Provision General Practice Other Primary Care Network Unmapped total
month
August 160927.0 21575852.0 449101.0 432448.0 1233843.0 23852171.0
September 187906.0 25940821.0 527174.0 530485.0 1336115.0 28522501.0
October 209539.0 27606171.0 556487.0 564981.0 1366656.0 30303834.0
November 207577.0 27767889.0 558784.0 614324.0 1256496.0 30405070.0
December 173504.0 23008818.0 464718.0 539479.0 954257.0 25140776.0
January 186375.0 23583053.0 457440.0 569044.0 839562.0 25635474.0
February 196627.0 23305934.0 456153.0 585300.0 811246.0 25355260.0
March 231905.0 27187368.0 530677.0 702176.0 942912.0 29595038.0
April 192284.0 21916791.0 437402.0 606270.0 760313.0 23913060.0
May 220511.0 25238620.0 503327.0 712280.0 820770.0 27495508.0
June 209652.0 23680374.0 478813.0 700599.0 758640.0 25828078.0
In [149]:
# Apply formatting
plot_format()

exclude_cols = ['Unmapped', 'total']  # columns to exclude
ss_filt = nc_apps.drop(columns=exclude_cols)

sns.lineplot(data=ss_filt, marker='o')
plt.xticks(rotation=45)
plt.title('Greatest burden on Surgeries')
plt.tight_layout()  # helps prevent label cutoff

# Uncomment to save the figure
# plt.savefig('slide_10.png', dpi=150, bbox_inches='tight')

plt.show()
No description has been provided for this image

General Practices account for significant majority of appointments (see above)
EAP use may indicate over capacity for whole year
Perhaps PCNs could be utilised during busy periods

In [150]:
plot_format()

exclude = ['Unmapped','General Practice','total']
ss_filt = nc_apps.drop(columns=exclude)

sns.lineplot(data=ss_filt, markers=True)
plt.xticks(rotation=45)
plt.title('Could other services absorb more capacity?')
plt.legend(loc='upper left')
plt.tight_layout()

#plt.savefig('slide_11a',dpi=150, bbox_inches='tight')

plt.show()
No description has been provided for this image
In [151]:
nc.head()
Out[151]:
appointment_date icb_ons_code icb_location_name sub_icb_code service_setting context_type national_category appointment_count appointment_month month day region year
0 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L Primary Care Network Care Related Encounter Patient contact during Care Home Round 3 2021-08 August Monday North East & Yorkshire 2021
1 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L Other Care Related Encounter Planned Clinics 7 2021-08 August Monday North East & Yorkshire 2021
2 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L General Practice Care Related Encounter Home Visit 79 2021-08 August Monday North East & Yorkshire 2021
3 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L General Practice Care Related Encounter General Consultation Acute 725 2021-08 August Monday North East & Yorkshire 2021
4 2021-08-02 E54000050 NHS North East and North Cumbria ICB 00L General Practice Care Related Encounter Structured Medication Review 2 2021-08 August Monday North East & Yorkshire 2021

Provide a summary of your findings and recommendations based on the analysis.¶

  1. Overall utilisation of services was calculated at 95.63%. This figure alone suggests there is enough capacity overall in the system, however to evaluate capacity in full more data would be required in relation to working hours, specifically overtime hours. Without this data its impossible to form a definitive conclusion in relation to capacity.
In [ ]:
 
  1. Overall there is little variance in utilisation during a year It ranges between 94.71% and 96.27%. In terms of appointments overall there is a double peak, from January to March, and a second in June.
In [152]:
# Plot monthly capacity utilisation (slide 1)
plot_format()
sns.lineplot(data=monthly_apps,x='month',y='total',label='Total')
sns.lineplot(data=monthly_apps,x='month',y='Attended', label='Attended')
sns.lineplot(data=monthly_apps,x='month',y='DNA',label='Did Not Attend')
plt.legend()
plt.title('Chart showing 95% attendance')
ax = plt.gca()
plt.title("appointments peak in autumn and march")
plt.xticks(rotation=45)
plt.savefig('slide_1.png', dpi=150,bbox_inches='tight');
No description has been provided for this image
  1. Staffing during busy periods. The increased % of appointments by Other staff, peaking in October, is likely to be linked to vaccinations. Most of which take place in autumn and early winter.
In [153]:
hcp_pivot2 = (
    pd.pivot_table(
        data=ar,
        values='appointment_count',
        index='month',
        columns='hcp_type',
        aggfunc='sum'
    )
    .assign(
        total=lambda df: df['GP'] + df['Other Practice staff'],
        **{
            'GP(%)': lambda df: (df['GP'] / df['total'] * 100).round(2),
            'Other(%)': lambda df: (df['Other Practice staff'] / df['total'] * 100).round(2)
        }
    )
)
In [154]:
#Create plot
sns.lineplot(data=hcp_pivot2,x='month',y='GP(%)', label='GP')
sns.lineplot(data=hcp_pivot2,x='month',y='Other(%)',label='Other Staff')
plt.legend()
plt.xticks(rotation=45)
plt.title("Pressure on whole practice")
plt.show()
No description has been provided for this image

Create a plot staff vs appointment mode. It appears Other staff have many more Face-to-Face visits What are the drivers behind the increase in Face-to-Face visits by both GPs and other staff in the surgery

In [155]:
# hcp and month
hcp = pd.pivot_table(
    data=ar,
    values='appointment_count',
    index=['hcp_type','month'],
    columns='appointment_mode',
    aggfunc='sum'
)
In [156]:
plot_format()
# Create the catplot
g = sns.catplot(data=ar, x='month', y='appointment_count', hue='appointment_mode', kind='bar', col='hcp_type')

# Rotate x-axis labels for all subplots
for ax in g.axes.flat: # creates a loop to check all subplots
    plt.setp(ax.get_xticklabels(), rotation=45)
plt.savefig('slide_18a',dpi=150, bbox_inches='tight')
plt.show()
No description has been provided for this image
  1. Increased demand during busy periods tends to be face to face appointments especially with Other Staff HCP Type. However Face-to-Face / other staff have a relatively low utilisation. A strategy is needed to either decrease face-to-face and or increase utilisation. Online sources indicate vaccinations take place in Autumn / March (covid booster). One solution might be to find alternative vaccination method.
In [157]:
# Reflect on utilisation during this period, can I add hcp_type

sns.lineplot(data=monthly_apps,x='month',y='utilisation(%)',label='Utilisation(%)')
plt.legend()
plt.title('Chart showing 95% attendance')
plt.xticks(rotation=90);
No description has been provided for this image
  1. Consider whether utilisation differs between hcp types and the mode of appointment? The lowest utilisation is for Face-to-Face meetings with Other Practice Staff, which explains previous slide which shows lower utilisation in October when Other Staff have more Face-to-Face appointments.
In [158]:
face_to_face = (
    pd.pivot_table(
        data=ar,
        values='appointment_count',
        index=['hcp_type', 'appointment_mode'],
        columns='appointment_status',
        aggfunc='sum'
    )
    .drop(index='Unknown', level=0)
    .assign(
        total=lambda df: df['Attended'] + df['DNA'],
        **{
            'utilisation(%)': lambda df: (df['Attended'] / df['total'] * 100).round(2)
        }
    )
    .reset_index()
    .query("appointment_mode == 'Face-to-Face'")
)
face_to_face
Out[158]:
appointment_status hcp_type appointment_mode Attended DNA Unknown total utilisation(%)
0 GP Face-to-Face 70089048 2599222 2420398 72688270 96.42
5 Other Practice staff Face-to-Face 92755796 8099216 5485883 100855012 91.97

Other factors that affect utilisation: 'lead booking time' is a key factor.

In [159]:
# Pivot table to show utilisation (overall)
lead_time = (
    pd.pivot_table(
        data=ar,
        values='appointment_count',
        index=['booking_lead_time'],
        columns='appointment_status',
        aggfunc='sum'
    )
    .assign(
        total=lambda df: df['Attended'] + df['DNA'],
        **{
            'Utilisation(%)': lambda df: (df['Attended'] / df['total'] * 100).round(2)
        }
    )
    .reset_index()
    .pipe(lead_time_order)
)
lead_time
Out[159]:
appointment_status booking_lead_time Attended DNA Unknown total Utilisation(%)
6 Same Day 125796242 2284169 2883749 128080411 98.22
0 1 Day 24003592 1039523 810944 25043115 95.85
2 2 to 7 Days 54636124 3689732 2394373 58325856 93.67
4 8 to 14 Days 32523007 2864860 2102542 35387867 91.90
1 15 to 21 Days 16506091 1563712 1367711 18069803 91.35
3 22 to 28 Days 9474372 931458 1011689 10405830 91.05
5 More than 28 Days 7539601 907394 1421513 8446995 89.26
In [160]:
# For overall pattern see conclusions
In [161]:
# Pivot table to show utilisation (GP)
lead_time_GP = (
    pd.pivot_table(
        data=ar[ar['hcp_type'] == 'GP'],
        values='appointment_count',
        index=['booking_lead_time'],
        columns='appointment_status',
        aggfunc='sum'
    )
    .assign(
        total=lambda df: df['Attended'] + df['DNA'],
        **{
            'Utilisation(%)': lambda df: (df['Attended'] / df['total'] * 100).round(2)
        }
    )
    .reset_index()
    .pipe(lead_time_order)
)
lead_time
Out[161]:
appointment_status booking_lead_time Attended DNA Unknown total Utilisation(%)
6 Same Day 125796242 2284169 2883749 128080411 98.22
0 1 Day 24003592 1039523 810944 25043115 95.85
2 2 to 7 Days 54636124 3689732 2394373 58325856 93.67
4 8 to 14 Days 32523007 2864860 2102542 35387867 91.90
1 15 to 21 Days 16506091 1563712 1367711 18069803 91.35
3 22 to 28 Days 9474372 931458 1011689 10405830 91.05
5 More than 28 Days 7539601 907394 1421513 8446995 89.26
In [162]:
# Create a figure and axis
fig, ax1 = plt.subplots(figsize=(10, 6))

# Plotting the bar plots for Attended and DNA
sns.barplot(data=lead_time_GP, x='booking_lead_time', y='Attended', ax=ax1, 
            color='orange', label='Attended',ci=None)
sns.barplot(data=lead_time_GP, x='booking_lead_time', y='DNA', ax=ax1, 
            color='blue', label='DNA',ci=None)

# Add a secondary y-axis for the utilisation line plot
ax2 = ax1.twinx()

# Plot the line for utilisation percentage
sns.lineplot(data=lead_time_GP, x='booking_lead_time', y='Utilisation(%)', ax=ax2, 
             color='green', label='Utilisation(%)', marker='o', linestyle='-', linewidth=2,ci=None)

# Set labels and title
ax1.set_xlabel('Booking Lead Time')
ax1.set_ylabel('Count')
ax2.set_ylabel('Utilisation (%)')

# Add legends
ax1.legend(loc='upper left', bbox_to_anchor=(0.2,1))
ax2.legend(loc='upper right')

# Rotate x-axis labels
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45, ha='right')

plt.title('Attendance and booking lead times (GP)')
plt.xticks(rotation=45)
plt.savefig('slide_15.png', dpi=150, bbox_inches='tight')

plt.show();
/var/folders/_g/lq327r795816_cw0zgqv50300000gn/T/ipykernel_8677/2078063227.py:27: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

No description has been provided for this image
In [ ]:
 
In [163]:
# Pivot table to show utilisation (HCP)
lead_time_HCP = (
    pd.pivot_table(
        data=ar[ar['hcp_type'] == 'Other Practice staff'],
        values='appointment_count',
        index=['booking_lead_time'],
        columns='appointment_status',
        aggfunc='sum'
    )
    .assign(
        total=lambda df: df['Attended'] + df['DNA'],
        **{
            'Utilisation(%)': lambda df: (df['Attended'] / df['total'] * 100).round(2)
        }
    )
    .reset_index()
    .pipe(lead_time_order)
)
lead_time
Out[163]:
appointment_status booking_lead_time Attended DNA Unknown total Utilisation(%)
6 Same Day 125796242 2284169 2883749 128080411 98.22
0 1 Day 24003592 1039523 810944 25043115 95.85
2 2 to 7 Days 54636124 3689732 2394373 58325856 93.67
4 8 to 14 Days 32523007 2864860 2102542 35387867 91.90
1 15 to 21 Days 16506091 1563712 1367711 18069803 91.35
3 22 to 28 Days 9474372 931458 1011689 10405830 91.05
5 More than 28 Days 7539601 907394 1421513 8446995 89.26
In [164]:
# Create a figure and axis
fig, ax1 = plt.subplots(figsize=(10, 6))

# Plotting the bar plots for Attended and DNA
sns.barplot(data=lead_time_HCP, x='booking_lead_time', y='Attended', ax=ax1, 
            color='orange', label='Attended',ci=None)
sns.barplot(data=lead_time_HCP, x='booking_lead_time', y='DNA', ax=ax1, 
            color='blue', label='DNA',ci=None)

# Add a secondary y-axis for the utilisation line plot
ax2 = ax1.twinx()

# Plot the line for utilisation percentage
sns.lineplot(data=lead_time_HCP, x='booking_lead_time', y='Utilisation(%)', ax=ax2, 
             color='green', label='Utilisation(%)', marker='o', linestyle='-', linewidth=2,ci=None)

# Set labels and title
ax1.set_xlabel('Booking Lead Time')
ax1.set_ylabel('Count')
ax2.set_ylabel('Utilisation (%)')

# Add legends
ax1.legend(loc='upper left', bbox_to_anchor=(0.2,1))
ax2.legend(loc='upper right')

# Rotate x-axis labels
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45, ha='right')

plt.title('Attendance and booking lead times (Other Staff)')
plt.xticks(rotation=45)
plt.savefig('slide_15.png', dpi=150, bbox_inches='tight')

plt.show();
/var/folders/_g/lq327r795816_cw0zgqv50300000gn/T/ipykernel_8677/1862548190.py:27: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

No description has been provided for this image

Other findings. Most appointments are 6-10 minutes long. Tuesday and Wednesday are the most popular day for appointments. I estimated average appointment to be 18.6 minutes. Most appointments are routine consultations. Planned clinics and planned clinical procedures increase in March. Twin peaks in Autumn and March.HCP type is a significant factor in terms of utilisation. This is exacerbated in relation to 'Other Practice Staff'. Increase in winter spring home visits.

Additional Charts¶

In [165]:
lead_time_4 = (
pd.pivot_table(
    data = ar[ar['appointment_status'] == 'Unknown'],  # Filter first
    values='appointment_count',
    index='booking_lead_time',
    columns='appointment_status',
    aggfunc='sum'
    )
    .reset_index()
    .pipe(lead_time_order)
)          
lead_time_4
Out[165]:
appointment_status booking_lead_time Unknown
6 Same Day 2883749
0 1 Day 810944
2 2 to 7 Days 2394373
4 8 to 14 Days 2102542
1 15 to 21 Days 1367711
3 22 to 28 Days 1011689
5 More than 28 Days 1421513
In [166]:
#Unknown values mirror plot above suggesting this is not the cause of 1 day anomaly

# Create a figure and axis
fig, ax1 = plt.subplots(figsize=(10, 6))
sns.barplot(data=lead_time_4, x='booking_lead_time', y='Unknown')

# Ensure 'booking_lead_time' is categorical with the correct order
lead_time_4['booking_lead_time'] = pd.Categorical(lead_time_4['booking_lead_time'], 
                                                categories=day_order2, ordered=True)
plt.title('Distribution of unknown values')
plt.savefig('slide_17.png',dpi=150, bbox_inches='tight')
No description has been provided for this image
In [167]:
region_ = (
    ar.pivot_table(
        values='appointment_count',
        index='region',
        columns='appointment_status',
        aggfunc='sum'
    )
    .assign(
        total=lambda df: df['Attended'] + df['DNA'],
        pct=lambda df: (df['Attended'] / df['total'] * 100).round(2)
    )
    .reset_index()
)
region_
Out[167]:
appointment_status region Attended DNA Unknown total pct
0 East of England 31569080 1202469 1261910 32771549 96.33
1 London 39207133 2406467 1809280 41613600 94.22
2 Midlands 52562333 2557642 2194472 55119975 95.36
3 North East & Yorkshire 44172983 1890871 1799711 46063854 95.90
4 North West 32034002 1991351 1755759 34025353 94.15
5 South East 41163384 1970145 1846803 43133529 95.43
6 South West 29770114 1261903 1324586 31032017 95.93
In [168]:
# Create a figure and axis
fig, ax1 = plt.subplots(figsize=(10, 6))

# Plotting the bar plots for Attended and DNA
sns.barplot(data=region_, x='region', y='Attended', ax=ax1, color='orange', label='Attended')
sns.barplot(data=region_, x='region', y='DNA', ax=ax1, color='blue', label='DNA')

# Add a secondary y-axis for the utilisation line plot
ax2 = ax1.twinx()

# Plot the line for utilisation percentage
sns.lineplot(data=region_, x='region', y='pct', ax=ax2, color='green', label='Utilisation(%)', 
             marker='o', linestyle='-', linewidth=2)

# Set labels and title
ax1.set_xlabel('region')
ax1.set_ylabel('Count')
ax2.set_ylabel('Utilisation (%)')

# Add legends
ax1.legend(loc='upper left', bbox_to_anchor=(0.1, 1))
ax2.legend(loc='upper right')

ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45, ha='right')
plt.title('Attendance and utilisation by region')
plt.savefig('slide_13.png',dpi=150, bbox_inches='tight')
/var/folders/_g/lq327r795816_cw0zgqv50300000gn/T/ipykernel_8677/1999816355.py:23: UserWarning:

set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

No description has been provided for this image
In [169]:
# Create a line plot to answer the question.
hcp_pivot = (
    pd.pivot_table(
        data=ar,
        values='appointment_count',
        index='month',
        columns='hcp_type',
        aggfunc='sum'
    )
    .assign(
        total=lambda df: df['GP'] + df['Other Practice staff'],
        **{
            'GP(%)': lambda df: (df['GP'] / df['total'] * 100).round(2),
            'Other(%)': lambda df: (df['Other Practice staff'] / df['total'] * 100).round(2)
        }
    )
    .reindex(month_order)
)

hcp_pivot
# greater variance
Out[169]:
hcp_type GP Other Practice staff Unknown total GP(%) Other(%)
month
August 12290112.0 10786456.0 753438.0 23076568.0 53.26 46.74
September 14477314.0 13116077.0 905952.0 27593391.0 52.47 47.53
October 14290342.0 14931278.0 1058932.0 29221620.0 48.90 51.10
November 14883843.0 14420204.0 1072573.0 29304047.0 50.79 49.21
December 12644693.0 11603844.0 869540.0 24248537.0 52.15 47.85
January 13123157.0 11694696.0 790600.0 24817853.0 52.88 47.12
February 12857137.0 11674435.0 794969.0 24531572.0 52.41 47.59
March 15113002.0 13515711.0 936704.0 28628713.0 52.79 47.21
April 11898608.0 11206857.0 784559.0 23105465.0 51.50 48.50
May 13780066.0 12803339.0 875781.0 26583405.0 51.84 48.16
June 12879089.0 12093451.0 825639.0 24972540.0 51.57 48.43
July NaN NaN NaN NaN NaN NaN
In [170]:
plot_format()
sns.lineplot(data=hcp_pivot,x='month',y='GP(%)', label='GP')
sns.lineplot(data=hcp_pivot,x='month',y='Other(%)',label='Other Staff')
plt.legend()
plt.title('Chart showing...')
plt.xticks(rotation=90)
plt.title("Pressure on whole practice")
Out[170]:
Text(0.5, 1.0, 'Pressure on whole practice')
No description has been provided for this image

Other stuff¶

In [171]:
# Create a line plot to answer the question (possibly create a facet by month later on?)
pivot_data = pd.pivot_table(
    data=ar,
    values='appointment_count',
    index=['month','appointment_status'],
    columns=['appointment_mode'],
    aggfunc='sum'
)
pivot_data
Out[171]:
appointment_mode Face-to-Face Home Visit Telephone Unknown Video/Online
month appointment_status
April Attended 13588356 132294 7492249 605161 117385
DNA 841473 5221 169628 19795 5983
Unknown 607970 32515 225109 44059 2826
August Attended 12481186 113533 8724392 652054 101541
DNA 723111 6549 184850 25028 5136
Unknown 500735 29571 233009 46233 3078
December Attended 13597975 132406 8289616 706247 114154
DNA 971685 8320 175815 32247 7146
Unknown 730885 38550 250753 58330 3948
February Attended 14059436 129188 8388916 637750 118086
DNA 858543 6069 179837 22675 6131
Unknown 601847 32563 239805 42649 3046
January Attended 13921132 127718 8776010 634792 120476
DNA 851572 7413 182045 24950 6494
Unknown 627739 34022 245226 45543 3321
June Attended 15154420 144698 7653457 616413 126471
DNA 947860 5796 185570 18815 6549
Unknown 626671 34094 233348 41140 2877
March Attended 16536414 152788 9579993 739858 140662
DNA 1035050 6369 212548 25876 7199
Unknown 747627 38395 286274 52581 3783
May Attended 16020354 153939 8350132 661023 133287
DNA 968223 6632 193483 20226 6804
Unknown 619678 37335 240340 44158 3572
November Attended 16949964 149543 9497883 905559 146366
DNA 1167488 8729 201527 39208 8595
Unknown 899217 41973 285646 69807 5115
October Attended 16982958 132144 8978200 907847 156905
DNA 1312707 7574 190570 42565 8900
Unknown 1164314 36299 273141 78215 8213
September Attended 15142705 126549 9575370 763197 137877
DNA 1066003 7517 205075 31792 7882
Unknown 1047942 34826 279469 63877 9262
In [172]:
plot_format()
sns.lineplot(data=pivot_data, x='month',y='Face-to-Face',hue='appointment_status')
plot_format()
sns.lineplot(data=pivot_data, x='month',y='Home Visit',hue='appointment_status')
plot_format()
sns.lineplot(data=pivot_data, x='month',y='Telephone',hue='appointment_status')
plot_format()
sns.lineplot(data=pivot_data, x='month',y='Video/Online',hue='appointment_status')
Out[172]:
<Axes: xlabel='month', ylabel='Face-to-Face'>
No description has been provided for this image
In [173]:
# Determine % of appointments cancelled relating to hcp_type
hcp = pd.pivot_table(
    data=ar,
    values='appointment_count',
    index=['hcp_type','appointment_mode'],
    columns='appointment_status',
    aggfunc='sum'
)
hcp['total'] = hcp['Attended'] + hcp['DNA']
# Created a column showing % of appointments used
hcp['utilisation(%)'] = (hcp['Attended']/ hcp['total'] * 100).round(2)

hcp
Out[173]:
appointment_status Attended DNA Unknown total utilisation(%)
hcp_type appointment_mode
GP Face-to-Face 70089048 2599222 2420398 72688270 96.42
Home Visit 647267 21000 147014 668267 96.86
Telephone 67330728 1144185 1637126 68474913 98.33
Unknown 1351566 44347 79520 1395913 96.82
Video/Online 683431 21027 21484 704458 97.02
Other Practice staff Face-to-Face 92755796 8099216 5485883 100855012 91.97
Home Visit 686722 27058 170434 713780 96.21
Telephone 26348843 893656 1074142 27242499 96.72
Unknown 1216193 103202 175506 1319395 92.18
Video/Online 726494 55709 27494 782203 92.88
Unknown Face-to-Face 1590056 45277 268344 1635333 97.23
Home Visit 160811 28131 72695 188942 85.11
Telephone 1626647 43107 80852 1669754 97.42
Unknown 5262142 155628 331566 5417770 97.13
Video/Online 3285 83 63 3368 97.54
In [174]:
# https://www.bbc.co.uk/news/articles/cvgpp0ze478o

Twitter text analysis¶

In [175]:
# Load data set 
twts = pd.read_csv('tweets.csv')
# View DataFrame
twts.sort_values('tweet_favorite_count', ascending=False).head(3)
Out[175]:
tweet_id tweet_full_text tweet_entities tweet_entities_hashtags tweet_metadata tweet_retweet_count tweet_favorite_count tweet_favorited tweet_retweeted tweet_lang
117 1567583855422611461 Lipid-Lowering Drugs\n\n#TipsForNewDocs #MedEd... {'hashtags': [{'text': 'TipsForNewDocs', 'indi... #TipsForNewDocs, #MedEd, #MedTwitter, #medicin... {'iso_language_code': 'en', 'result_type': 're... 12 42 False False en
433 1567582427719282689 You ready for $JCO @_JennyCo ❤️\n\n#Healthcare... {'hashtags': [{'text': 'Healthcare', 'indices'... #Healthcare {'iso_language_code': 'en', 'result_type': 're... 1 28 False False en
205 1567634936341069826 How health insurance works 😂 \n\n#comedy #adul... {'hashtags': [{'text': 'comedy', 'indices': [3... #comedy, #adulting, #healthcare {'iso_language_code': 'en', 'result_type': 're... 5 20 False False en
In [176]:
# Explore the metadata.
twts.dtypes
Out[176]:
tweet_id                    int64
tweet_full_text            object
tweet_entities             object
tweet_entities_hashtags    object
tweet_metadata             object
tweet_retweet_count         int64
tweet_favorite_count        int64
tweet_favorited              bool
tweet_retweeted              bool
tweet_lang                 object
dtype: object

Clean and explore the data

In [177]:
# Check data types and null values
twts.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1174 entries, 0 to 1173
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   tweet_id                 1174 non-null   int64 
 1   tweet_full_text          1174 non-null   object
 2   tweet_entities           1174 non-null   object
 3   tweet_entities_hashtags  1007 non-null   object
 4   tweet_metadata           1174 non-null   object
 5   tweet_retweet_count      1174 non-null   int64 
 6   tweet_favorite_count     1174 non-null   int64 
 7   tweet_favorited          1174 non-null   bool  
 8   tweet_retweeted          1174 non-null   bool  
 9   tweet_lang               1174 non-null   object
dtypes: bool(2), int64(3), object(5)
memory usage: 75.8+ KB
In [178]:
# Drop rows containing nulls
twts.dropna(inplace=True)
In [179]:
# Check deletions worked
twts.isnull().sum()
Out[179]:
tweet_id                   0
tweet_full_text            0
tweet_entities             0
tweet_entities_hashtags    0
tweet_metadata             0
tweet_retweet_count        0
tweet_favorite_count       0
tweet_favorited            0
tweet_retweeted            0
tweet_lang                 0
dtype: int64
In [180]:
# Check deletions worked
twts.isnull().sum()
Out[180]:
tweet_id                   0
tweet_full_text            0
tweet_entities             0
tweet_entities_hashtags    0
tweet_metadata             0
tweet_retweet_count        0
tweet_favorite_count       0
tweet_favorited            0
tweet_retweeted            0
tweet_lang                 0
dtype: int64
In [181]:
# Check for unique values
twts.nunique()
Out[181]:
tweet_id                   1007
tweet_full_text             857
tweet_entities              854
tweet_entities_hashtags     710
tweet_metadata                1
tweet_retweet_count          35
tweet_favorite_count         20
tweet_favorited               1
tweet_retweeted               1
tweet_lang                    1
dtype: int64
In [182]:
# Create new DataFrame containing only the text.
twt_2 = twts[['tweet_full_text','tweet_entities_hashtags']]
# View the DataFrame.
twt_2.head(10)
Out[182]:
tweet_full_text tweet_entities_hashtags
16 RT @imedverse: I.V Drug Calculations Cheat She... #TipsForNewDocs, #MedEd, #MedTwitter, #medicin...
17 RT @Khulood_Almani: 🔟#Applications of #AI in #... #Applications, #AI, #healthcare, #digitalhealt...
18 RT @Khulood_Almani: 🔟#Applications of #AI in #... #Applications, #AI, #healthcare, #digitalhealt...
19 RT @Khulood_Almani: 🔟#Applications of #AI in #... #Applications, #AI, #healthcare, #digitalhealt...
20 RT @Khulood_Almani: 🔟#Applications of #AI in #... #Applications, #AI, #healthcare, #digitalhealt...
21 RT @Khulood_Almani: #Healthcare #DigitalTransf... #Healthcare, #DigitalTransformation, #digitalh...
22 RT @SoniaFurstenau: Germany has one of the top... #healthcare
23 RT @SoniaFurstenau: Germany has one of the top... #healthcare
24 RT @SoniaFurstenau: Germany has one of the top... #healthcare
25 RT @imedverse: Features of Lung Diseases\n\n#p... #pulmonary, #pulmtwitter, #pulmonology, #pulmo...
In [183]:
# find individual hashtags
hashtags_list = twt_2['tweet_full_text'].str.findall(r'#\w+')
# create flattened list
all_hashtags = [x for sublist in hashtags_list for x in sublist]
#print(all_hashtags)
In [184]:
# Create a counter object (dictionary-like)
from collections import Counter
counts = Counter(all_hashtags)
In [185]:
# Transform counter object into a DataFrame
hashtag_df = pd.DataFrame(counts.items(), columns=['Hashtag','Count']).sort_values(by='Count', ascending=False)
In [186]:
# Display records where the count is larger than 25
hashtag_abv_10 = hashtag_df[hashtag_df['Count'] > 25]
hashtag_abv_10
Out[186]:
Hashtag Count
8 #healthcare 599
15 #Healthcare 198
55 #health 70
28 #HealthCare 47
7 #AI 45
3 #medicine 42
203 #job 38
346 #strategy 31
4 #medical 30
In [187]:
# Create the plot.
plot_format()
sns.barplot(data=hashtag_abv_10, y='Hashtag', x='Count')
plt.xticks(rotation=90)
plt.title('Most popular hashtags')
plt.show()
No description has been provided for this image
In [188]:
# Display records where the count is larger than 10 but less than 47 (to preclude health/healthcare)
hashtag_mid = hashtag_df[(hashtag_df['Count'] > 10) & (hashtag_df['Count'] <= 45)]
In [189]:
# Create a new plot filtering out health / healthcare
plt.figure(figsize=(10,8))
sns.barplot(data=hashtag_mid, y='Hashtag', x='Count')
plt.xticks(rotation=90)
plt.title('Most popular hashtags (minus "health")')
plt.savefig('slide_7.png',dpi=150, bbox_inches='tight')
plt.show()
# View the barplot.
No description has been provided for this image

I removed health as it was an umbrella term but the above leads to few useful insights. One recommendation would be to review all the #TipsForNewDocs and perhaps #PatientCare for possible improvements. As a data analyst I was most interested by reviewing the Python hashtags (see below).

In [190]:
# Filter for rows containing #TipsForNewDocs
tips_tweets = twts[twts['tweet_entities_hashtags'].str.contains('#Python', na=False)]
tips_tweets.head()
Out[190]:
tweet_id tweet_full_text tweet_entities tweet_entities_hashtags tweet_metadata tweet_retweet_count tweet_favorite_count tweet_favorited tweet_retweeted tweet_lang
21 1567609182001545221 RT @Khulood_Almani: #Healthcare #DigitalTransf... {'hashtags': [{'text': 'Healthcare', 'indices'... #Healthcare, #DigitalTransformation, #digitalh... {'iso_language_code': 'en', 'result_type': 're... 107 0 False False en
129 1567655135421108226 RT @pratititech: #AI tools in #healthcare!\n\n... {'hashtags': [{'text': 'AI', 'indices': [17, 2... #AI, #healthcare, #Python, #Nodejs, #MachineLe... {'iso_language_code': 'en', 'result_type': 're... 9 0 False False en
130 1567654980970217472 RT @pratititech: #AI tools in #healthcare!\n\n... {'hashtags': [{'text': 'AI', 'indices': [17, 2... #AI, #healthcare, #Python, #Nodejs, #MachineLe... {'iso_language_code': 'en', 'result_type': 're... 9 0 False False en
131 1567654978881470464 RT @pratititech: #AI tools in #healthcare!\n\n... {'hashtags': [{'text': 'AI', 'indices': [17, 2... #AI, #healthcare, #Python, #Nodejs, #MachineLe... {'iso_language_code': 'en', 'result_type': 're... 9 0 False False en
132 1567654962318180354 RT @pratititech: #AI tools in #healthcare!\n\n... {'hashtags': [{'text': 'AI', 'indices': [17, 2... #AI, #healthcare, #Python, #Nodejs, #MachineLe... {'iso_language_code': 'en', 'result_type': 're... 9 0 False False en

Key Findings¶

Overall Resource Utilisation: 95.3% system-wide, with seasonal peaks in autumn (October-November) and March. During peak periods, 10-10.3% of total annual appointments occur monthly, with utilisation paradoxically lowest in October (94.6%) when demand is highest.
Service Distribution: 98% of appointments occur on weekdays, with 91.5% delivered through General Practices. National categories show: Clinical Triage (14%), General Consultation Acute (18%), and General Consultation Routine (33%). The high proportion of Clinical Triage indicates significant demand pressure and access challenges.
Seasonal Patterns: Autumn and March peaks create substantial pressure on all staff categories, with work reallocation proving difficult during these periods. GP-driven demand particularly evident in March, while autumn pressures affect broader practice teams.

Conclusion 1: Booking Lead Time Impact¶

Finding: Clear inverse relationship between booking lead time and utilisation - shorter intervals between booking and appointments correlate with reduced missed appointments.
Recommendation: Limit appointments booked more than 28 days in advance. Develop automated contact systems (potentially AI-enabled) to confirm patient requirements for advance bookings.
Follow-up: Investigate the clinical nature and necessity of missed long-lead appointments.

In [191]:
# Super simple version
fig = go.Figure([
    go.Bar(x=lead_time['booking_lead_time'], y=lead_time['Attended'], 
           name='Attended', marker_color='orange'),
    go.Bar(x=lead_time['booking_lead_time'], y=lead_time['DNA'], 
           name='DNA', marker_color='blue'),
    go.Scatter(x=lead_time['booking_lead_time'], y=lead_time['Utilisation(%)'], 
               mode='lines+markers', name='Utilisation(%)', yaxis='y2', 
               line_color='green')
])
fig.update_layout(
    yaxis2=dict(overlaying='y', side='right'),
    title='Attendance and booking lead times (All Staff)',
    barmode='stack'
)

# Apply your aesthetic formatting
format_plotly_chart(fig).show()

Conclusion 2: Practitioner and Appointment Mode Utilisation¶

Finding: Healthcare practitioner type significantly affects utilisation, particularly for face-to-face appointments. GP attendance rates reach 96.5% compared to 92% for other practice staff, representing 2.5 million lost hours annually for non-GP face-to-face appointments.
Recommendation: Investigate underlying causes of lower utilisation for non-GP face-to-face appointments, particularly whether these represent routine health checks that patients deprioritise.
Follow-up: Analyse appointment types and patient demographics for non-GP missed appointments before implementing targeted interventions.

In [192]:
# Determine % of appointments cancelled relating to hcp_type
{
    "tags": [
        "hide-input",
    ]
}
hcp_f2f = (ar
    .query("hcp_type != 'Unknown' and appointment_status != 'Unknown'")  # Filter out Unknowns
    .pivot_table(
        values='appointment_count',
        index=['hcp_type', 'appointment_mode'],
        columns='appointment_status',
        aggfunc='sum'
    )
    .assign(
        total=lambda x: x['Attended'] + x['DNA'],
        utilisation_pct=lambda x: (x['Attended'] / x['total'] * 100).round(2)
    )
    .reset_index()
    .query("appointment_mode == 'Face-to-Face'")
)

hcp_f2f
Out[192]:
appointment_status hcp_type appointment_mode Attended DNA total utilisation_pct
0 GP Face-to-Face 70089048 2599222 72688270 96.42
5 Other Practice staff Face-to-Face 92755796 8099216 100855012 91.97
In [193]:
# Create subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# First subplot - Utilisation percentage
sns.barplot(data=hcp_f2f, 
            y='utilisation_pct', 
            x='hcp_type', 
            hue='appointment_mode',
            ax=ax1,
            )
ax1.set_title('Face-to-Face Utilisation(%)')
ax1.set_xlabel('Staff Type')
ax1.set_ylabel('Utilisation (%)')

# Add labels to first subplot
for c in ax1.containers:
    ax1.bar_label(c, fmt='%.2f%%')

# Second subplot - Total appointments
sns.barplot(data=hcp_f2f, 
            y='DNA', 
            x='hcp_type', 
            hue='appointment_mode',
            ax=ax2,
            )
ax2.set_title('Missed Face-to-Face Appointments')
ax2.set_xlabel('Staff Type')
ax2.set_ylabel('Total Appointments')

# Remove legend from second subplot
ax2.get_legend().remove()

# Add labels to second subplot
for c in ax2.containers:
    ax2.bar_label(c, fmt=lambda x: f'{x/1e6:.1f}M')

# Adjust layout to prevent overlap
plt.tight_layout()
plt.show()
No description has been provided for this image
In [194]:
fig = go.Figure()

# Add each line directly from the pivot table
fig.add_trace(go.Scatter(
    x=mode_pivot.index, 
    y=mode_pivot[('Face-to-Face', 'Attended (%)')],
    mode='lines+markers',
    name='Face to Face'
))

fig.add_trace(go.Scatter(
    x=mode_pivot.index, 
    y=mode_pivot[('Home Visit', 'Attended (%)')],
    mode='lines+markers', 
    name='Home Visit'
))

fig.add_trace(go.Scatter(
    x=mode_pivot.index,
    y=mode_pivot[('Telephone','Attended (%)')],
    mode='lines+markers',
    name='Telephone'
))  
fig.add_trace(go.Scatter(
    x=mode_pivot.index,
    y=mode_pivot[('Video/Online','Attended (%)')],
    mode='lines+markers',
    name='Video'
))
fig.update_layout(
    title='Appointments Attended by Mode (%)',
    xaxis_title='Month',
    yaxis_title='Attended (%)',
    height=500
)

format_plotly_chart(fig).show()

Conclusion 3: Clinical Triage as Capacity Indicator¶

Finding: Clinical triages comprise 14% of appointments, with higher proportions in individual GP surgeries compared to Primary Care Networks (PCNs). This differential suggests PCNs operate more efficiently, as clinical triage typically indicates system pressure.
Recommendation: Accelerate GP surgery integration into PCNs to improve operational efficiency and reduce pressure-driven triaging.
Follow-up: Investigate alternative explanations for triage differentials between service models.

In [195]:
# Sort your data to match the month order before plotting
national_filtered['month'] = pd.Categorical(national_filtered['month'], 
                                            categories=month_order, ordered=True)
# aggregate for plotly
df = national_filtered.groupby(['national_category', 'month'], 
                               as_index=False)['appointment_count'].sum()

# Filter out July from the data
df_no_july = df[df['month'] != 'July']

# Create the line plot
fig = px.line(df_no_july, 
              x='month', 
              y='appointment_count', 
              color='national_category',
              title='Autumn and March Planned Procedures and Clinics')

# Update layout for legend and x-axis rotation
fig.update_layout(
    legend=dict(
        orientation="v",    # vertical orientation
        yanchor="middle",  
        y=0.5,              
        xanchor="left",      
        x=1.2             
    ),
)
format_plotly_chart(fig).show()
In [196]:
fig = go.Figure()
fig.add_trace(go.Bar(
    x=triage2['service_setting'],
    y=triage2['pct'],
    marker_color='blue',  # Direct color assignment
    name='Other'
))
fig.update_layout(
    title='Clinical Triage Appointments by Service Setting',
    xaxis_title='Region',
    yaxis_title='Appointment Count'
)
format_plotly_chart(fig).show()

Conclusion 4: Extended Access Provision (EAP) Seasonal Response¶

Finding: EAP absorbs additional seasonal demand, with autumn increases driven by planned clinics and vaccinations, while March peaks reflect acute consultation needs. This demonstrates system flexibility but also indicates capacity strain during predictable periods.
Recommendation: Explore innovative vaccination delivery methods (micro-needle patches, oral vaccinations currently in clinical trials) through pharmacy networks or home delivery to reduce surgery-based vaccination burden.
Follow-up: Pilot alternative vaccination delivery models within PCN frameworks.

In [197]:
# Create the line plot
fig = px.line(filtered_natcat4)

# Update layout with title and formatting
fig.update_layout(
    title='Extended Access Provision Appointments',
    xaxis_tickangle=45,
    legend=dict(
        orientation="h",  # horizontal legend
        yanchor="top",
        y=-0.4,
        xanchor="left",
        x=0
    ),
    margin=dict(b=100)  # Add bottom margin for legend space
)

# Show the plot
format_plotly_chart(fig).show()

Overall Strategic Conclusion¶

This analysis reveals a primary care system operating at high utilisation (95.3%) but with identifiable inefficiencies that compound during predictable seasonal peaks. The four key recommendations address different layers of the system: operational practices (booking patterns), service delivery models (practitioner-specific utilisation), structural organisation (PCN integration), and innovative capacity management (alternative vaccination delivery). Together, these interventions target the fundamental challenge identified in the research questions: while the system demonstrates adequate overall capacity, resource allocation and utilisation patterns create pressure points that reduce effectiveness. The high proportion of clinical triage (14%) and seasonal capacity strain indicate a system managing demand reactively rather than strategically. The recommendations collectively propose a shift toward more proactive, integrated primary care delivery. Better booking practices reduce waste, PCN integration improves operational efficiency, and innovative service delivery methods address predictable seasonal demands. This multi-layered approach acknowledges that NHS capacity challenges require both immediate operational improvements and longer-term structural changes to create sustainable, patient-centered primary care services.

Limitations: The 3 data sets could not be merged limiting the analysis as it was not possible to utilise all variables in combination.

Accurately evaluating capacity would require more information, such as staff numbers overtime and unpaid hours, to assess whether the staff are already working beyond their capacity. This would enable the calculation of capacity e.g contractual hours x staff numbers.